--- license: cc language: - en base_model: - Qwen/Qwen2.5-3B tags: - qwen2 - qwen - text-generation - question-answering - research - engineering - lora - 4bit - bitsandbytes - faiss - rag metrics: - type: rougeL value: 57.2 - type: bleu value: 42.8 library_name: transformers --- # 🛰️ ResearchQwen 2.5-3B-LoRA **Compact, domain-expert Q&A for systems researchers.** Base model: [Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) Tuning recipe: 4-bit **QLoRA** with **bitsandbytes** NF4 quantisation Retriever: FAISS cosine-similarity store for ~33 k document chunks --- ## 🚀 Quick inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_id = "Programmer-RD-AI/ResearchQwen2.5-3B-LoRA" tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype="auto", load_in_4bit=True, # uses bitsandbytes ) qa = pipeline("text-generation", model=model, tokenizer=tok) print(qa("Explain how Chain Replication with Apportioned Queries improves tail-latency.")) ```` ### llama.cpp / GGUF ```bash wget https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA/resolve/main/model_Q4_K_M.gguf ./main -m model_Q4_K_M.gguf -p "Give the core idea of the 3FS log-structured layout in 3 sentences." ``` --- ## 📚 Training data | Source | Docs | Words | | -------------------------- | ------ | --------- | | 3FS white-paper | 14 | 162 k | | CRAQ spec + benchmarks | 11 | 119 k | | Distributed AI infra notes | 32 | 287 k | | *Total* | **57** | **568 k** | Synthetic Q\&A pairs were generated with an instruction template tuned for factual density; unhelpful pairs were filtered via a weak-to-strong scoring cascade (ROUGE-L > 0.4, BLEU > 0.35) ([GitHub][1]). --- ## 🛠️ Fine-tuning details | Setting | Value | | --------- | ---------------------------------------------------------- | | GPU | 1× A100 40 GB | | Precision | 4-bit NF4 w/ double-quant (bnb 0.45.4) | | LoRA r/α | 64 / 16 | | LR sched | cosine, 5 % warm-up | | Steps | 1 100 | | Epochs | 3 | | Peak VRAM | 21 GB | --- ## 📈 Evaluation | Metric | Base Qwen2.5-3B | **This model** | | ------- | --------------- | -------------- | | ROUGE-L | 45.6 | **57.2** | | BLEU-4 | 30.4 | **42.8** | > See `eval/` for scripts and raw scores (ROUGE, BLEU). --- ## 🔗 Integration recipe (RAG) ```python from langchain.vectorstores import FAISS # or llama-index from langchain.embeddings import HuggingFaceEmbeddings emb = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5") vs = FAISS.from_texts(texts, emb) ``` Retriever-generator latency: 330 ms average (GPU), 1.9 s average (CPU, gguf-int4). --- ## 💡 Why it should trend * **Fresh domain niche** – deep systems-engineering Q\&A is underserved on HF. * **Ultra-portable** – 4-bit LoRA + GGUF = laptop-friendly. * **Full stack repo** – weights, notebook, RAG demo, eval scripts. * **Eye-catching tags** – `qwen2`, `lora`, `rag`, `research` map directly to popular HF filters and the trending feed ([Hugging Face][4]). * **Clear usage code** – copy-run experience = more downloads. --- ## ⚠️ Limitations & responsible use * Trained solely on English; non-English queries degrade sharply. * Answers may quote or paraphrase the training docs verbatim. * Not suitable for critical medical / legal advice. * LoRA adapters are GPL-3.0; commercial use must comply with both GPL-3.0 and the Qwen 2.5 base license. --- ## ✍️ Citation ```bibtex @misc{ranuga_disansa_gamage_2025, author = { Ranuga Disansa Gamage and Rivindu Ashinsa and Thuan Naheem and Sanila Wijesekara }, title = { ResearchQwen-2.5-3B-LoRA (Revision 7ea9f5f) }, year = 2025, url = { https://huggingface.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA }, doi = { 10.57967/hf/5623 }, publisher = { Hugging Face } } ```