---
license: cc
language:
- en
base_model:
- Qwen/Qwen2.5-3B
tags:
- qwen2
- qwen
- text-generation
- question-answering
- research
- engineering
- lora
- 4bit
- bitsandbytes
- faiss
- rag
metrics:
- type: rougeL
  value: 57.2
- type: bleu
  value: 42.8
library_name: transformers
---

# 🛰️ ResearchQwen 2.5-3B-LoRA

**Compact, domain-expert Q&A for systems researchers.**  
Base model: [Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B)
Tuning recipe: 4-bit **QLoRA** with **bitsandbytes** NF4 quantisation
Retriever: FAISS cosine-similarity store for ~33 k document chunks

---

## 🚀 Quick inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "Programmer-RD-AI/ResearchQwen2.5-3B-LoRA"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    load_in_4bit=True,  # uses bitsandbytes
)
qa = pipeline("text-generation", model=model, tokenizer=tok)
print(qa("Explain how Chain Replication with Apportioned Queries improves tail-latency."))
````

### llama.cpp / GGUF

```bash
wget https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA/resolve/main/model_Q4_K_M.gguf
./main -m model_Q4_K_M.gguf -p "Give the core idea of the 3FS log-structured layout in 3 sentences."
```

---

## 📚 Training data

| Source                     | Docs   | Words     |
| -------------------------- | ------ | --------- |
| 3FS white-paper            | 14     | 162 k     |
| CRAQ spec + benchmarks     | 11     | 119 k     |
| Distributed AI infra notes | 32     | 287 k     |
| *Total*                    | **57** | **568 k** |

Synthetic Q\&A pairs were generated with an instruction template tuned for factual density; unhelpful pairs were filtered via a weak-to-strong scoring cascade (ROUGE-L > 0.4, BLEU > 0.35) ([GitHub][1]).

---

## 🛠️ Fine-tuning details

| Setting   | Value                                                      |
| --------- | ---------------------------------------------------------- |
| GPU       | 1× A100 40 GB                                              |
| Precision | 4-bit NF4 w/ double-quant (bnb 0.45.4)                     |
| LoRA r/α  | 64 / 16                                                    |
| LR sched  | cosine, 5 % warm-up                                        |
| Steps     | 1 100                                                      |
| Epochs    | 3                                                          |
| Peak VRAM | 21 GB                                                      |

---

## 📈 Evaluation

| Metric  | Base Qwen2.5-3B | **This model** |
| ------- | --------------- | -------------- |
| ROUGE-L | 45.6            | **57.2**       |
| BLEU-4  | 30.4            | **42.8**       |

> See `eval/` for scripts and raw scores (ROUGE, BLEU).

---

## 🔗 Integration recipe (RAG)

```python
from langchain.vectorstores import FAISS       # or llama-index
from langchain.embeddings import HuggingFaceEmbeddings

emb = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
vs = FAISS.from_texts(texts, emb)
```

Retriever-generator latency: 330 ms average (GPU), 1.9 s average (CPU, gguf-int4).

---

## 💡 Why it should trend

* **Fresh domain niche** – deep systems-engineering Q\&A is underserved on HF.
* **Ultra-portable** – 4-bit LoRA + GGUF = laptop-friendly.
* **Full stack repo** – weights, notebook, RAG demo, eval scripts.
* **Eye-catching tags** – `qwen2`, `lora`, `rag`, `research` map directly to popular HF filters and the trending feed ([Hugging Face][4]).
* **Clear usage code** – copy-run experience = more downloads.

---

## ⚠️ Limitations & responsible use

* Trained solely on English; non-English queries degrade sharply.
* Answers may quote or paraphrase the training docs verbatim.
* Not suitable for critical medical / legal advice.
* LoRA adapters are GPL-3.0; commercial use must comply with both GPL-3.0 and the Qwen 2.5 base license.

---

## ✍️ Citation

```bibtex
@misc{ranuga_disansa_gamage_2025,
	author       = { Ranuga Disansa Gamage and Rivindu Ashinsa and Thuan Naheem and Sanila Wijesekara },
	title        = { ResearchQwen-2.5-3B-LoRA (Revision 7ea9f5f) },
	year         = 2025,
	url          = { https://huggingface.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA },
	doi          = { 10.57967/hf/5623 },
	publisher    = { Hugging Face }
}
```