llama3.2_socratic_dpo_unsloth

A fine-tuned version of Llama 3.2 trained with DPO (Direct Preference Optimization) using Unsloth, optimized to act as a Socratic tutor — guiding students through reasoning rather than giving answers directly.

Training Details

Detail	Value
Base model	Llama 3.2
Method	DPO (Direct Preference Optimization)
Framework	Unsloth
Quantization	4-bit (inference)

Usage

1. Install dependencies

%%capture
!pip install unsloth
!pip install --upgrade transformers

2. Load the model

from unsloth import FastLanguageModel

model1, tokenizer1 = FastLanguageModel.from_pretrained(
    model_name     = "Bialy17/llama3.2_socratic_dpo_unsloth",
    max_seq_length = 2048,
    dtype          = None,       # auto-detect (float16 on Colab T4)
    load_in_4bit   = True,       # saves VRAM — works fine on free T4
)

FastLanguageModel.for_inference(model1)

3. CLI Chat (with streaming)

import torch
from transformers import TextStreamer

text_streamer = TextStreamer(tokenizer1, skip_prompt=True)

SYSTEM_MSG = (
    "You are a helpful Tutor that never gives the final answer directly "
    "to exam-type questions (MCQ, fill-in-the-blank, true/false, etc.). "
    "Instead, guide the student using the Socratic method."
)

chat_history = [
    {"role": "system", "content": SYSTEM_MSG}
]

print("=" * 50)
print("  Socratic Tutor  |  'exit' to quit  |  'clear' to reset")
print("=" * 50 + "\n")

while True:
    user_input = input("Student: ").strip()

    if not user_input:
        continue

    if user_input.lower() in ("exit", "quit", "q"):
        print("Goodbye!")
        break

    if user_input.lower() == "clear":
        chat_history = [chat_history[0]]  # keep system message
        print("\n--- History Cleared ---\n")
        continue

    chat_history.append({"role": "user", "content": user_input})

    inputs = tokenizer1.apply_chat_template(
        chat_history,
        tokenize              = True,
        add_generation_prompt = True,
        return_tensors        = "pt",
    ).to("cuda")

    print("\nTutor: ", end="", flush=True)

    with torch.no_grad():
        outputs = model1.generate(
            input_ids    = inputs,
            streamer     = text_streamer,
            temperature  = 0.1,
            do_sample    = True,
            pad_token_id = tokenizer1.eos_token_id,
        )

    # Save only new tokens to history
    response = tokenizer1.decode(
        outputs[0][inputs.shape[-1]:],
        skip_special_tokens=True
    ).strip()

    chat_history.append({"role": "assistant", "content": response})
    print()

Notes

Runs on Google Colab T4 (free tier) with load_in_4bit=True
Type clear during chat to reset conversation history while keeping the system prompt
Best suited for exam-style Q&A, tutoring sessions, and guided reasoning tasks

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bialy17/llama3.2_socratic_dpo_unsloth

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Finetuned

(586)

this model

Bialy17
/

llama3.2_socratic_dpo_unsloth