shiwk24 commited on
Commit
edf45ba
Β·
verified Β·
1 Parent(s): 041e2e1

update README

Browse files
Files changed (1) hide show
  1. README.md +3 -9
README.md CHANGED
@@ -4,17 +4,15 @@ language:
4
  license: apache-2.0
5
  tags:
6
  - mathematics
7
- - multimodal-reasoning
 
 
8
  - visual-chain-of-thought
9
  - vcot
10
- - generative-ai
11
- - computer-vision
12
- - bagel
13
  - mathcanvas
14
  pipeline_tag: any-to-any
15
  library_name: transformers
16
  ---
17
-
18
  # BAGEL-Canvas
19
 
20
  <p align="center">
@@ -30,7 +28,6 @@ library_name: transformers
30
  <img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub Code">
31
  </a>
32
  </p>
33
-
34
  ## πŸ“– Overview
35
 
36
  **BAGEL-Canvas** is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic **Visual Chain-of-Thought (VCoT)** capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive **[MathCanvas]** framework.
@@ -43,7 +40,6 @@ Unlike prior models that often fail by generating incorrect (e.g., BAGEL-Zebra-C
43
  Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.
44
  </figcaption>
45
  </figure>
46
-
47
  ## πŸš€ Training Recipe
48
 
49
  BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, designed to systematically build its visual reasoning abilities.
@@ -56,10 +52,8 @@ BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, design
56
  This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:
57
  - **MathCanvas-Imagen (10M pairs):** Teaches text-to-diagram generation.
58
  - **MathCanvas-Edit (5.2M pairs):** Teaches step-by-step diagram editing and manipulation.
59
-
60
  2. **Stage II: Strategic Visual-Aided Reasoning (Fine-tuning)**
61
  In this stage, the model learns *when* and *how* to strategically deploy its visual skills to solve problems. It is fine-tuned on **MathCanvas-Instruct**, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.
62
-
63
  ## πŸ† Performance
64
 
65
  BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.
 
4
  license: apache-2.0
5
  tags:
6
  - mathematics
7
+ - reasoning
8
+ - multi-modal
9
+ - image-text-interleave
10
  - visual-chain-of-thought
11
  - vcot
 
 
 
12
  - mathcanvas
13
  pipeline_tag: any-to-any
14
  library_name: transformers
15
  ---
 
16
  # BAGEL-Canvas
17
 
18
  <p align="center">
 
28
  <img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub Code">
29
  </a>
30
  </p>
 
31
  ## πŸ“– Overview
32
 
33
  **BAGEL-Canvas** is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic **Visual Chain-of-Thought (VCoT)** capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive **[MathCanvas]** framework.
 
40
  Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.
41
  </figcaption>
42
  </figure>
 
43
  ## πŸš€ Training Recipe
44
 
45
  BAGEL-Canvas is trained following the two-stage **MathCanvas** framework, designed to systematically build its visual reasoning abilities.
 
52
  This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:
53
  - **MathCanvas-Imagen (10M pairs):** Teaches text-to-diagram generation.
54
  - **MathCanvas-Edit (5.2M pairs):** Teaches step-by-step diagram editing and manipulation.
 
55
  2. **Stage II: Strategic Visual-Aided Reasoning (Fine-tuning)**
56
  In this stage, the model learns *when* and *how* to strategically deploy its visual skills to solve problems. It is fine-tuned on **MathCanvas-Instruct**, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.
 
57
  ## πŸ† Performance
58
 
59
  BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.