SmolVLM-LaTeX (256M)
SmolVLM-LaTeX is a lightweight vision-language model (256M parameters) fine-tuned to transcribe handwritten mathematical equations directly into valid LaTeX code.
It transforms images of math problems into copy-pasteable LaTeX strings, making it ideal for digitizing notes and academic content on resource-constrained devices.
How to Use (* might not work peroperly)
Note: Training is still in progress, improved version coming soon.
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
# Load Model
model_id = "TEEN-D/smolvlm-256m-latex"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
_attn_implementation="flash_attention_2" if torch.cuda.is_available() else "eager"
).to("cuda")
# Inference
image = Image.open("equation.png")
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Convert this equation into LaTeX."}
]
},
]
# Prepare inputs
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt").to("cuda")
# Generate
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_texts = processor.batch_decode(
generated_ids,
skip_special_tokens=True
)
print(generated_texts[0])
Intended Uses
- Digitization: Converting handwritten math notes to digital formats.
- Education: Helping students verify handwritten solutions.
- Data Entry: Speeding up the transcription of scientific papers.
- Downloads last month
- -
Model tree for Teen-Different/smolvlm-256m-latex
Base model
HuggingFaceTB/SmolLM2-135M
Quantized
HuggingFaceTB/SmolLM2-135M-Instruct
Quantized
HuggingFaceTB/SmolVLM-256M-Instruct