| | --- |
| | license: cc |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen2.5-3B |
| | tags: |
| | - qwen2 |
| | - qwen |
| | - text-generation |
| | - question-answering |
| | - research |
| | - engineering |
| | - lora |
| | - 4bit |
| | - bitsandbytes |
| | - faiss |
| | - rag |
| | metrics: |
| | - type: rougeL |
| | value: 57.2 |
| | - type: bleu |
| | value: 42.8 |
| | library_name: transformers |
| | --- |
| | |
| | # 🛰️ ResearchQwen 2.5-3B-LoRA |
| |
|
| | **Compact, domain-expert Q&A for systems researchers.** |
| | Base model: [Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) |
| | Tuning recipe: 4-bit **QLoRA** with **bitsandbytes** NF4 quantisation |
| | Retriever: FAISS cosine-similarity store for ~33 k document chunks |
| |
|
| | --- |
| |
|
| | ## 🚀 Quick inference |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
| | |
| | model_id = "Programmer-RD-AI/ResearchQwen2.5-3B-LoRA" |
| | tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | device_map="auto", |
| | torch_dtype="auto", |
| | load_in_4bit=True, # uses bitsandbytes |
| | ) |
| | qa = pipeline("text-generation", model=model, tokenizer=tok) |
| | print(qa("Explain how Chain Replication with Apportioned Queries improves tail-latency.")) |
| | ```` |
| |
|
| | ### llama.cpp / GGUF |
| |
|
| | ```bash |
| | wget https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA/resolve/main/model_Q4_K_M.gguf |
| | ./main -m model_Q4_K_M.gguf -p "Give the core idea of the 3FS log-structured layout in 3 sentences." |
| | ``` |
| |
|
| | --- |
| |
|
| | ## 📚 Training data |
| |
|
| | | Source | Docs | Words | |
| | | -------------------------- | ------ | --------- | |
| | | 3FS white-paper | 14 | 162 k | |
| | | CRAQ spec + benchmarks | 11 | 119 k | |
| | | Distributed AI infra notes | 32 | 287 k | |
| | | *Total* | **57** | **568 k** | |
| |
|
| | Synthetic Q\&A pairs were generated with an instruction template tuned for factual density; unhelpful pairs were filtered via a weak-to-strong scoring cascade (ROUGE-L > 0.4, BLEU > 0.35) ([GitHub][1]). |
| |
|
| | --- |
| |
|
| | ## 🛠️ Fine-tuning details |
| |
|
| | | Setting | Value | |
| | | --------- | ---------------------------------------------------------- | |
| | | GPU | 1× A100 40 GB | |
| | | Precision | 4-bit NF4 w/ double-quant (bnb 0.45.4) | |
| | | LoRA r/α | 64 / 16 | |
| | | LR sched | cosine, 5 % warm-up | |
| | | Steps | 1 100 | |
| | | Epochs | 3 | |
| | | Peak VRAM | 21 GB | |
| |
|
| | --- |
| |
|
| | ## 📈 Evaluation |
| |
|
| | | Metric | Base Qwen2.5-3B | **This model** | |
| | | ------- | --------------- | -------------- | |
| | | ROUGE-L | 45.6 | **57.2** | |
| | | BLEU-4 | 30.4 | **42.8** | |
| |
|
| | > See `eval/` for scripts and raw scores (ROUGE, BLEU). |
| |
|
| | --- |
| |
|
| | ## 🔗 Integration recipe (RAG) |
| |
|
| | ```python |
| | from langchain.vectorstores import FAISS # or llama-index |
| | from langchain.embeddings import HuggingFaceEmbeddings |
| | |
| | emb = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5") |
| | vs = FAISS.from_texts(texts, emb) |
| | ``` |
| |
|
| | Retriever-generator latency: 330 ms average (GPU), 1.9 s average (CPU, gguf-int4). |
| |
|
| | --- |
| |
|
| | ## 💡 Why it should trend |
| |
|
| | * **Fresh domain niche** – deep systems-engineering Q\&A is underserved on HF. |
| | * **Ultra-portable** – 4-bit LoRA + GGUF = laptop-friendly. |
| | * **Full stack repo** – weights, notebook, RAG demo, eval scripts. |
| | * **Eye-catching tags** – `qwen2`, `lora`, `rag`, `research` map directly to popular HF filters and the trending feed ([Hugging Face][4]). |
| | * **Clear usage code** – copy-run experience = more downloads. |
| |
|
| | --- |
| |
|
| | ## ⚠️ Limitations & responsible use |
| |
|
| | * Trained solely on English; non-English queries degrade sharply. |
| | * Answers may quote or paraphrase the training docs verbatim. |
| | * Not suitable for critical medical / legal advice. |
| | * LoRA adapters are GPL-3.0; commercial use must comply with both GPL-3.0 and the Qwen 2.5 base license. |
| |
|
| | --- |
| |
|
| | ## ✍️ Citation |
| |
|
| | ```bibtex |
| | @misc{ranuga_disansa_gamage_2025, |
| | author = { Ranuga Disansa Gamage and Rivindu Ashinsa and Thuan Naheem and Sanila Wijesekara }, |
| | title = { ResearchQwen-2.5-3B-LoRA (Revision 7ea9f5f) }, |
| | year = 2025, |
| | url = { https://huggingface.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA }, |
| | doi = { 10.57967/hf/5623 }, |
| | publisher = { Hugging Face } |
| | } |
| | ``` |