Sai345/math-tutor-sft-dataset
Viewer โข Updated โข 2.15k โข 44
How to use Sai345/llama-3.1-8b-math-tutor with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3.1-8b-instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "Sai345/llama-3.1-8b-math-tutor")How to use Sai345/llama-3.1-8b-math-tutor with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sai345/llama-3.1-8b-math-tutor to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sai345/llama-3.1-8b-math-tutor to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Sai345/llama-3.1-8b-math-tutor to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="Sai345/llama-3.1-8b-math-tutor",
max_seq_length=2048,
)A fine-tuned version of Llama 3.1 8B Instruct trained to function as Ravi, a math tutoring assistant specializing in algebra, calculus, and word problems. Trained with QLoRA via Unsloth + TRL SFTTrainer on Google Colab T4.
Ravi teaches rather than just answers โ it scaffolds understanding, asks checkpoint questions, handles student misconceptions with a 4-tier escalation protocol, and redirects out-of-domain queries.
| Property | Value |
|---|---|
| Base model | unsloth/llama-3.1-8b-instruct-bnb-4bit |
| Fine-tuning method | QLoRA (4-bit NF4 quantization) |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training steps | 200 |
| Learning rate | 2e-4 (cosine schedule, 10 warmup steps) |
| Effective batch size | 8 (2 per device ร 4 gradient accumulation) |
| Optimizer | AdamW 8-bit |
| Max sequence length | 2,048 tokens |
| Packing | Disabled |
| Loss masking | Student turns masked (-100); model trains on teacher responses only |
| Platform | Google Colab T4 (15GB VRAM, ~13.6GB used) |
| Framework | Unsloth + PEFT + TRL SFTTrainer |
Training data: Sai345/math-tutor-sft-dataset
| Source | Examples | Description |
|---|---|---|
MathDial (eth-nlped/mathdial) |
1,696 | Multi-turn math tutoring dialogues. Filtered to self_correctness == "Yes" only. Converted from pipe-delimited format to Llama 3.1 chat template. Teacher tags stripped. License: CC-BY-SA 4.0 |
| Synthetic (Groq Llama 4 Scout 17B) | 455 | 5 typed categories: algebra scaffolding (150), word problem scaffolding (80), misconception correction (120), difficulty adaptation (80), OOD refusal (25). All follow the Ravi persona and 4-tier escalation protocol. |
| Total | 2,151 | 1,935 train / 216 test (90/10 split) |
Each example is a full multi-turn conversation formatted as a single training instance with the system prompt embedded.
pip install transformers peft bitsandbytes accelerate