llama3.2_socratic_dpo_unsloth
A fine-tuned version of Llama 3.2 trained with DPO (Direct Preference Optimization) using Unsloth, optimized to act as a Socratic tutor — guiding students through reasoning rather than giving answers directly.
Training Details
| Detail | Value |
|---|---|
| Base model | Llama 3.2 |
| Method | DPO (Direct Preference Optimization) |
| Framework | Unsloth |
| Quantization | 4-bit (inference) |
Usage
1. Install dependencies
%%capture
!pip install unsloth
!pip install --upgrade transformers
2. Load the model
from unsloth import FastLanguageModel
model1, tokenizer1 = FastLanguageModel.from_pretrained(
model_name = "Bialy17/llama3.2_socratic_dpo_unsloth",
max_seq_length = 2048,
dtype = None, # auto-detect (float16 on Colab T4)
load_in_4bit = True, # saves VRAM — works fine on free T4
)
FastLanguageModel.for_inference(model1)
3. CLI Chat (with streaming)
import torch
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer1, skip_prompt=True)
SYSTEM_MSG = (
"You are a helpful Tutor that never gives the final answer directly "
"to exam-type questions (MCQ, fill-in-the-blank, true/false, etc.). "
"Instead, guide the student using the Socratic method."
)
chat_history = [
{"role": "system", "content": SYSTEM_MSG}
]
print("=" * 50)
print(" Socratic Tutor | 'exit' to quit | 'clear' to reset")
print("=" * 50 + "\n")
while True:
user_input = input("Student: ").strip()
if not user_input:
continue
if user_input.lower() in ("exit", "quit", "q"):
print("Goodbye!")
break
if user_input.lower() == "clear":
chat_history = [chat_history[0]] # keep system message
print("\n--- History Cleared ---\n")
continue
chat_history.append({"role": "user", "content": user_input})
inputs = tokenizer1.apply_chat_template(
chat_history,
tokenize = True,
add_generation_prompt = True,
return_tensors = "pt",
).to("cuda")
print("\nTutor: ", end="", flush=True)
with torch.no_grad():
outputs = model1.generate(
input_ids = inputs,
streamer = text_streamer,
temperature = 0.1,
do_sample = True,
pad_token_id = tokenizer1.eos_token_id,
)
# Save only new tokens to history
response = tokenizer1.decode(
outputs[0][inputs.shape[-1]:],
skip_special_tokens=True
).strip()
chat_history.append({"role": "assistant", "content": response})
print()
Notes
- Runs on Google Colab T4 (free tier) with
load_in_4bit=True - Type
clearduring chat to reset conversation history while keeping the system prompt - Best suited for exam-style Q&A, tutoring sessions, and guided reasoning tasks
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Bialy17/llama3.2_socratic_dpo_unsloth
Base model
meta-llama/Llama-3.2-3B-Instruct Finetuned
unsloth/Llama-3.2-3B-Instruct