SmolLM-135M-GEC-SFT-DPO

A style-preserving grammar correction model based on SmolLM-135M, trained with SFT + DPO to make minimal, targeted corrections while preserving your original writing style.

Why This Model?

Unlike large language models (GPT, Claude, etc.) that tend to rewrite entire sentences, this model makes minimal, targeted corrections - fixing only grammatical errors while preserving your vocabulary, tone, and voice. Perfect for:

Legal documents: Maintain precise legal terminology
Academic writing: Preserve scholarly tone
ESL/EFL education: Help learners without changing their ideas
Professional communications: Keep your authentic voice

Key Features

Minimal corrections: Fixes only grammatical errors, doesn't rewrite your sentences
Style preservation: Maintains your vocabulary, tone, and voice
Small & efficient: Only 135M parameters (~500MB) - runs on CPU!
BLEU score: ~0.50 on grammar correction benchmarks

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DanJZY/SmolLM-135M-GEC-SFT-DPO")
tokenizer = AutoTokenizer.from_pretrained("DanJZY/SmolLM-135M-GEC-SFT-DPO")

text = "As the number of people grows, the need of habitable environment is essential."
inputs = tokenizer(f"Fix grammar: {text}", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example: Style-Preserving vs Over-Correction

Original (with error):
"As the number of people grows, the need of habitable environment is essential."

✅ Our Model (Style-Preserving):
"As the number of people grows, the need for a habitable environment is essential."
                                         ↑
                            Only fixes "of" → "for a"

❌ Typical Model (Over-Correction):
"As population growth continues, the necessity for a habitable environment becomes essential."
                            ↑
    Completely rewrites: changes vocabulary, structure, and tone

Training Details

Parameter	Value
Base model	SmolLM-135M
Training method	SFT + DPO (Direct Preference Optimization)
Preference pairs	~19,000 (generated using edit distance)
Total experiments	28 (22 SFT + 6 DPO/IPO)
Hardware	8x RTX 3090
Training time	~3 hours

Resources

Resource	Link
GitHub Repository	ZhuoyuanJiang/SmolLM-GEC-SFT-DPO
Full Experiment Checkpoints	Google Drive (~68GB)
Training Notebooks	GitHub notebooks/

Intended Use

Grammar correction for English text
Writing assistance that preserves author's voice
Educational tools for language learners
Proofreading applications

Limitations

English only
Best for sentence-level corrections
Not designed for stylistic improvements (only grammar)

Citation

@misc{smollm_gec_sft_dpo_2025,
  title={SmolLM-135M-GEC-SFT-DPO: Style-Preserving Grammar Correction with Direct Preference Optimization},
  author={Zhuoyuan Jiang},
  year={2025},
  url={https://huggingface.co/DanJZY/SmolLM-135M-GEC-SFT-DPO},
  note={Fine-tuned SmolLM-135M for minimal, style-preserving grammatical error correction}
}

Acknowledgments

Special thanks to Nima Tajbakhsh (Nvidia) for guidance on efficient training methods.

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for DanJZY/SmolLM-135M-GEC-SFT-DPO

Base model

HuggingFaceTB/SmolLM-135M

Finetuned

(117)

this model

Dataset used to train DanJZY/SmolLM-135M-GEC-SFT-DPO

Evaluation results

BLEU
self-reported

0.500