| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen3-8B |
| pipeline_tag: image-text-to-text |
| tags: |
| - Bee-8B |
| - Fully-Open-MLLMs |
| datasets: |
| - Open-Bee/Honey-Data-15M |
| library_name: transformers |
| --- |
| # Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs |
|
|
| [[π Homepage](https://open-bee.github.io/)] [[π Arxiv Paper](https://arxiv.org/pdf/2510.13795)] [[π€ Models & Datasets](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995)] [[π» Code](https://github.com/Open-Bee)] |
|
|
|
|
| ## Introduction |
|
|
| We introduce **Bee-8B**, a new state-of-the-art, fully open 8B Multimodal Large Language Model (MLLM) designed to close the performance gap with proprietary models by focusing on data quality. |
|
|
| Bee-8B is trained on our new **Honey-Data-15M** corpus, a high-quality supervised fine-tuning (SFT) dataset of approximately 15 million samples. This dataset was meticulously created with our transparent, adaptable, and open-source data curation pipeline, **HoneyPipe**, which systematically cleans noisy data and enriches it with a novel dual-level (short and long) Chain-of-Thought (CoT) strategy. |
|
|
| This dataset enables Bee-8B to achieve exceptional performance, particularly in complex reasoning, establishing a new standard for fully open MLLMs. |
|
|
| ## Key Features |
|
|
| - **High-Quality, Large-Scale Dataset:** We release **Honey-Data-15M**, a new 15M-sample SFT corpus. It has undergone extensive cleaning to remove widespread noise and has been enriched with dual-level CoT reasoning to enhance advanced problem-solving capabilities. |
| - **Fully Open-Source Data Curation Suite:** We provide not just the data, but the entire methodology. **HoneyPipe** and its underlying framework **DataStudio** offer the community a transparent and reproducible pipeline, moving beyond static dataset releases. |
| - **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data. |
|
|
| ## News |
|
|
| - **[2025.12.17]** π₯ We have released all data and model weights across different stages. For the final stage (RL data), you can directly merge [ViRL39K](https://huggingface.co/datasets/TIGER-Lab/ViRL39K) and [MMK12](https://huggingface.co/datasets/FanqingM/MMK12) and use the [VeRL](https://github.com/volcengine/verl) framework for training. |
|
|
| - **[2025.11.03]** π **[Honey-Data-15M](https://huggingface.co/datasets/Open-Bee/Honey-Data-15M) & [Honey-Data-1M](https://huggingface.co/datasets/Open-Bee/Honey-Data-1M) is Released\!** You can download the 15M full version and the 1M efficient version from [HuggingFace]((https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995)). |
|
|
| - **[2025.10.20]** π **vLLM Support is Here!** Bee-8B now supports high-performance inference with [vLLM](https://github.com/vllm-project/vllm), enabling faster and more efficient deployment for production use cases. |
|
|
| - **[2025.10.13]** π **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995). |
|
|
| ## Bee-8B-Stage1 |
|
|
| > [!IMPORTANT] |
| > **This is NOT a complete model and cannot be used for inference directly.** |
|
|
|
|
| This repository contains the MLP projector weights that bridge the vision encoder ([SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384)) and the language model ([Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)). |
|
|
| **Weights:** |
| | Key | Shape | Description | |
| |-----|-------|-------------| |
| | `model.multi_modal_projector.pre_norm.weight` | [1152] | Pre-normalization weight | |
| | `model.multi_modal_projector.pre_norm.bias` | [1152] | Pre-normalization bias | |
| | `model.multi_modal_projector.linear_1.weight` | [4096, 1152] | First linear layer | |
| | `model.multi_modal_projector.linear_1.bias` | [4096] | First linear bias | |
| | `model.multi_modal_projector.linear_2.weight` | [4096, 4096] | Second linear layer | |
| | `model.multi_modal_projector.linear_2.bias` | [4096] | Second linear bias | |
|
|
| ## Acknowledgements |
|
|
| Bee-8B is developed based on the architectures and codebases of the following projects: [R-4B](https://huggingface.co/YannQi/R-4B), [LLaVA-OneVision](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), and evaluated using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding contributions to the open-source community. |