update README
Browse files
README.md
CHANGED
|
@@ -4,17 +4,15 @@ language:
|
|
| 4 |
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
- mathematics
|
| 7 |
-
-
|
|
|
|
|
|
|
| 8 |
- visual-chain-of-thought
|
| 9 |
- vcot
|
| 10 |
-
- generative-ai
|
| 11 |
-
- computer-vision
|
| 12 |
-
- bagel
|
| 13 |
- mathcanvas
|
| 14 |
pipeline_tag: any-to-any
|
| 15 |
library_name: transformers
|
| 16 |
---
|
| 17 |
-
|
| 18 |
# BAGEL-Canvas
|
| 19 |
|
| 20 |
<p align="center">
|
|
@@ -30,7 +28,6 @@ library_name: transformers
|
|
| 30 |
<img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub Code">
|
| 31 |
</a>
|
| 32 |
</p>
|
| 33 |
-
|
| 34 |
## π Overview
|
| 35 |
|
| 36 |
**BAGEL-Canvas** is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic **Visual Chain-of-Thought (VCoT)** capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive **[MathCanvas]** framework.
|
|
@@ -43,7 +40,6 @@ Unlike prior models that often fail by generating incorrect (e.g., BAGEL-Zebra-C
|
|
| 43 |
Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.
|
| 44 |
</figcaption>
|
| 45 |
</figure>
|
| 46 |
-
|
| 47 |
## π Training Recipe
|
| 48 |
|
| 49 |
BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, designed to systematically build its visual reasoning abilities.
|
|
@@ -56,10 +52,8 @@ BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, design
|
|
| 56 |
This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:
|
| 57 |
- **MathCanvas-Imagen (10M pairs):** Teaches text-to-diagram generation.
|
| 58 |
- **MathCanvas-Edit (5.2M pairs):** Teaches step-by-step diagram editing and manipulation.
|
| 59 |
-
|
| 60 |
2. **Stage II: Strategic Visual-Aided Reasoning (Fine-tuning)**
|
| 61 |
In this stage, the model learns *when* and *how* to strategically deploy its visual skills to solve problems. It is fine-tuned on **MathCanvas-Instruct**, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.
|
| 62 |
-
|
| 63 |
## π Performance
|
| 64 |
|
| 65 |
BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.
|
|
|
|
| 4 |
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
- mathematics
|
| 7 |
+
- reasoning
|
| 8 |
+
- multi-modal
|
| 9 |
+
- image-text-interleave
|
| 10 |
- visual-chain-of-thought
|
| 11 |
- vcot
|
|
|
|
|
|
|
|
|
|
| 12 |
- mathcanvas
|
| 13 |
pipeline_tag: any-to-any
|
| 14 |
library_name: transformers
|
| 15 |
---
|
|
|
|
| 16 |
# BAGEL-Canvas
|
| 17 |
|
| 18 |
<p align="center">
|
|
|
|
| 28 |
<img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub Code">
|
| 29 |
</a>
|
| 30 |
</p>
|
|
|
|
| 31 |
## π Overview
|
| 32 |
|
| 33 |
**BAGEL-Canvas** is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic **Visual Chain-of-Thought (VCoT)** capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive **[MathCanvas]** framework.
|
|
|
|
| 40 |
Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.
|
| 41 |
</figcaption>
|
| 42 |
</figure>
|
|
|
|
| 43 |
## π Training Recipe
|
| 44 |
|
| 45 |
BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, designed to systematically build its visual reasoning abilities.
|
|
|
|
| 52 |
This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:
|
| 53 |
- **MathCanvas-Imagen (10M pairs):** Teaches text-to-diagram generation.
|
| 54 |
- **MathCanvas-Edit (5.2M pairs):** Teaches step-by-step diagram editing and manipulation.
|
|
|
|
| 55 |
2. **Stage II: Strategic Visual-Aided Reasoning (Fine-tuning)**
|
| 56 |
In this stage, the model learns *when* and *how* to strategically deploy its visual skills to solve problems. It is fine-tuned on **MathCanvas-Instruct**, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.
|
|
|
|
| 57 |
## π Performance
|
| 58 |
|
| 59 |
BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.
|