SmolVLM-LaTeX (256M)

SmolVLM-LaTeX is a lightweight vision-language model (256M parameters) fine-tuned to transcribe handwritten mathematical equations directly into valid LaTeX code.

It transforms images of math problems into copy-pasteable LaTeX strings, making it ideal for digitizing notes and academic content on resource-constrained devices.

How to Use (* might not work peroperly)

Note: Training is still in progress, improved version coming soon.

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image

# Load Model
model_id = "TEEN-D/smolvlm-256m-latex"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    _attn_implementation="flash_attention_2" if torch.cuda.is_available() else "eager"
).to("cuda")

# Inference
image = Image.open("equation.png")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Convert this equation into LaTeX."}
        ]
    },
]

# Prepare inputs
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt").to("cuda")

# Generate
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_texts = processor.batch_decode(
    generated_ids, 
    skip_special_tokens=True
)

print(generated_texts[0])

Intended Uses

  • Digitization: Converting handwritten math notes to digital formats.
  • Education: Helping students verify handwritten solutions.
  • Data Entry: Speeding up the transcription of scientific papers.

Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Teen-Different/smolvlm-256m-latex

Dataset used to train Teen-Different/smolvlm-256m-latex