SmallEvals — QA Generation Model (Qwen3-0.6B)

Repository (HF): mburaksayici/golden_generate_qwen_0.6b_v3
Repository (GGUF): mburaksayici/golden_generate_qwen_0.6b_v3_gguf
Base Model: Qwen3-0.6B
Format: FP16 + GGUF quantizations
Primary Use: QA generation for RAG evaluation

This model is part of SmallEvals — an open-source framework for evaluating retrieval-augmented generation systems by generating high-quality golden QA datasets.


Overview

This model is fine-tuned to extract single atomic question-answer pairs from a passage.

Designed for:

  • Golden dataset generation
  • RAG benchmarking
  • Chunk validation
  • Evaluation corpus bootstrapping
  • Retriever quality testing

Training Data

The model was trained on:

  • TriviaQA
  • SQuAD 2.0
  • Hand-curated synthetic data generated using Qwen-70B

Training focused on:

  • Grounded questions
  • Single-fact answers
  • JSON-only outputs
  • Minimal verbosity

Prompt Template

The model expects the following instruction format:

Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.

PASSAGE:
"<.<passage>.>"

Return ONLY a JSON object.

Example Output

{
  "question": "When was the Eiffel Tower completed?",
  "answer": "1889"
}

Inference

llama.cpp

llama-cli \
  --hf mburaksayici/golden_generate_qwen_0.6b_v3_gguf \
  -p "Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.\n\nPASSAGE:\n\"The Eiffel Tower was completed in 1889.\"\n\nReturn ONLY a JSON object."

Ollama

This repository includes an Ollama Modelfile.

ollama run mburaksayici/golden_generate_qwen_0.6b_v3_gguf

Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mburaksayici/golden_generate_qwen_0.6b_v3"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = """Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.

PASSAGE:
"The Great Wall of China was built over several centuries."

Return ONLY a JSON object.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Available Files (GGUF)

File Description
Qwen3-0.6B.F16.gguf Full precision
Qwen3-0.6B.Q8_0.gguf Best quality quant
Qwen3-0.6B.Q5_K_M.gguf Balanced
Qwen3-0.6B.Q4_K_M.gguf Fast + compact

Intended Use

✅ RAG evaluation
✅ QA dataset generation
✅ Retriever testing
✅ Chunk quality scoring
✅ Benchmark creation


Not Intended For

❌ Chatbots
❌ Creative writing
❌ Long-form summarization
❌ Instruction-following
❌ Multi-hop reasoning


License

Apache-2.0
(Base model license applies)



Related Projects

  • SmallEvals
  • EvalVD
  • ChunkTuner
  • Golden-QAG
  • RAG-Boilerplate

Available Model files:

  • Qwen3-0.6B.F16.gguf
  • Qwen3-0.6B.Q5_K_M.gguf
  • Qwen3-0.6B.Q8_0.gguf
  • Qwen3-0.6B.Q4_K_M.gguf
Downloads last month
98
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mburaksayici/golden_generate_qwen_0.6b_v3_gguf

Finetuned
Qwen/Qwen3-0.6B
Quantized
(207)
this model

Dataset used to train mburaksayici/golden_generate_qwen_0.6b_v3_gguf