shiwk24
/

BAGEL-Canvas

@@ -4,17 +4,15 @@ language:
 license: apache-2.0
 tags:
 - mathematics
-- multimodal-reasoning
 - visual-chain-of-thought
 - vcot
-- generative-ai
-- computer-vision
-- bagel
 - mathcanvas
 pipeline_tag: any-to-any
 library_name: transformers
 ---
 # BAGEL-Canvas
 <p align="center">
@@ -30,7 +28,6 @@ library_name: transformers
     <img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub Code">
   </a>
 </p>
 ## 📖 Overview
 **BAGEL-Canvas** is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic **Visual Chain-of-Thought (VCoT)** capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive **[MathCanvas]** framework.
@@ -43,7 +40,6 @@ Unlike prior models that often fail by generating incorrect (e.g., BAGEL-Zebra-C
     Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.
   </figcaption>
 </figure>
 ## 🚀 Training Recipe
 BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, designed to systematically build its visual reasoning abilities.
@@ -56,10 +52,8 @@ BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, design
     This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:
     - **MathCanvas-Imagen (10M pairs):** Teaches text-to-diagram generation.
     - **MathCanvas-Edit (5.2M pairs):** Teaches step-by-step diagram editing and manipulation.
 2.  **Stage II: Strategic Visual-Aided Reasoning (Fine-tuning)**
     In this stage, the model learns *when* and *how* to strategically deploy its visual skills to solve problems. It is fine-tuned on **MathCanvas-Instruct**, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.
 ## 🏆 Performance
 BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.

 license: apache-2.0
 tags:
 - mathematics
+- reasoning
+- multi-modal
+- image-text-interleave
 - visual-chain-of-thought
 - vcot
 - mathcanvas
 pipeline_tag: any-to-any
 library_name: transformers
 ---
 # BAGEL-Canvas
 <p align="center">
     <img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub Code">
   </a>
 </p>
 ## 📖 Overview
 **BAGEL-Canvas** is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic **Visual Chain-of-Thought (VCoT)** capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive **[MathCanvas]** framework.
     Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.
   </figcaption>
 </figure>
 ## 🚀 Training Recipe
 BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, designed to systematically build its visual reasoning abilities.
     This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:
     - **MathCanvas-Imagen (10M pairs):** Teaches text-to-diagram generation.
     - **MathCanvas-Edit (5.2M pairs):** Teaches step-by-step diagram editing and manipulation.
 2.  **Stage II: Strategic Visual-Aided Reasoning (Fine-tuning)**
     In this stage, the model learns *when* and *how* to strategically deploy its visual skills to solve problems. It is fine-tuned on **MathCanvas-Instruct**, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.
 ## 🏆 Performance
 BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.