Ravi โ€” Math Tutor (Llama 3.1 8B Instruct)

A fine-tuned version of Llama 3.1 8B Instruct trained to function as Ravi, a math tutoring assistant specializing in algebra, calculus, and word problems. Trained with QLoRA via Unsloth + TRL SFTTrainer on Google Colab T4.

Ravi teaches rather than just answers โ€” it scaffolds understanding, asks checkpoint questions, handles student misconceptions with a 4-tier escalation protocol, and redirects out-of-domain queries.


Model Details

Property Value
Base model unsloth/llama-3.1-8b-instruct-bnb-4bit
Fine-tuning method QLoRA (4-bit NF4 quantization)
LoRA rank 16
LoRA alpha 16
LoRA dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training steps 200
Learning rate 2e-4 (cosine schedule, 10 warmup steps)
Effective batch size 8 (2 per device ร— 4 gradient accumulation)
Optimizer AdamW 8-bit
Max sequence length 2,048 tokens
Packing Disabled
Loss masking Student turns masked (-100); model trains on teacher responses only
Platform Google Colab T4 (15GB VRAM, ~13.6GB used)
Framework Unsloth + PEFT + TRL SFTTrainer

Dataset

Training data: Sai345/math-tutor-sft-dataset

Source Examples Description
MathDial (eth-nlped/mathdial) 1,696 Multi-turn math tutoring dialogues. Filtered to self_correctness == "Yes" only. Converted from pipe-delimited format to Llama 3.1 chat template. Teacher tags stripped. License: CC-BY-SA 4.0
Synthetic (Groq Llama 4 Scout 17B) 455 5 typed categories: algebra scaffolding (150), word problem scaffolding (80), misconception correction (120), difficulty adaptation (80), OOD refusal (25). All follow the Ravi persona and 4-tier escalation protocol.
Total 2,151 1,935 train / 216 test (90/10 split)

Each example is a full multi-turn conversation formatted as a single training instance with the system prompt embedded.


Usage

Requirements

pip install transformers peft bitsandbytes accelerate
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train Sai345/llama-3.1-8b-math-tutor