Instructions to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA", dtype="auto") - llama-cpp-python
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA", filename="ResearchQwen2.5-3B-LoRA-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "{\n \"question\": \"What is my name?\",\n \"context\": \"My name is Clara and I live in Berkeley.\"\n}" ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Use Docker
docker model run hf.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Ollama:
ollama run hf.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
- Unsloth Studio new
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA to start chatting
- Pi new
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Docker Model Runner:
docker model run hf.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
- Lemonade
How to use Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA:Q4_K_M
Run and chat with the model
lemonade run user.ResearchQwen-2.5-3B-LoRA-Q4_K_M
List all available models
lemonade list
🛰️ ResearchQwen 2.5-3B-LoRA
Compact, domain-expert Q&A for systems researchers.
Base model: Qwen/Qwen2.5-3B
Tuning recipe: 4-bit QLoRA with bitsandbytes NF4 quantisation
Retriever: FAISS cosine-similarity store for ~33 k document chunks
🚀 Quick inference
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "Programmer-RD-AI/ResearchQwen2.5-3B-LoRA"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
load_in_4bit=True, # uses bitsandbytes
)
qa = pipeline("text-generation", model=model, tokenizer=tok)
print(qa("Explain how Chain Replication with Apportioned Queries improves tail-latency."))
llama.cpp / GGUF
wget https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA/resolve/main/model_Q4_K_M.gguf
./main -m model_Q4_K_M.gguf -p "Give the core idea of the 3FS log-structured layout in 3 sentences."
📚 Training data
| Source | Docs | Words |
|---|---|---|
| 3FS white-paper | 14 | 162 k |
| CRAQ spec + benchmarks | 11 | 119 k |
| Distributed AI infra notes | 32 | 287 k |
| Total | 57 | 568 k |
Synthetic Q&A pairs were generated with an instruction template tuned for factual density; unhelpful pairs were filtered via a weak-to-strong scoring cascade (ROUGE-L > 0.4, BLEU > 0.35) ([GitHub][1]).
🛠️ Fine-tuning details
| Setting | Value |
|---|---|
| GPU | 1× A100 40 GB |
| Precision | 4-bit NF4 w/ double-quant (bnb 0.45.4) |
| LoRA r/α | 64 / 16 |
| LR sched | cosine, 5 % warm-up |
| Steps | 1 100 |
| Epochs | 3 |
| Peak VRAM | 21 GB |
📈 Evaluation
| Metric | Base Qwen2.5-3B | This model |
|---|---|---|
| ROUGE-L | 45.6 | 57.2 |
| BLEU-4 | 30.4 | 42.8 |
See
eval/for scripts and raw scores (ROUGE, BLEU).
🔗 Integration recipe (RAG)
from langchain.vectorstores import FAISS # or llama-index
from langchain.embeddings import HuggingFaceEmbeddings
emb = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
vs = FAISS.from_texts(texts, emb)
Retriever-generator latency: 330 ms average (GPU), 1.9 s average (CPU, gguf-int4).
💡 Why it should trend
- Fresh domain niche – deep systems-engineering Q&A is underserved on HF.
- Ultra-portable – 4-bit LoRA + GGUF = laptop-friendly.
- Full stack repo – weights, notebook, RAG demo, eval scripts.
- Eye-catching tags –
qwen2,lora,rag,researchmap directly to popular HF filters and the trending feed ([Hugging Face][4]). - Clear usage code – copy-run experience = more downloads.
⚠️ Limitations & responsible use
- Trained solely on English; non-English queries degrade sharply.
- Answers may quote or paraphrase the training docs verbatim.
- Not suitable for critical medical / legal advice.
- LoRA adapters are GPL-3.0; commercial use must comply with both GPL-3.0 and the Qwen 2.5 base license.
✍️ Citation
@misc{ranuga_disansa_gamage_2025,
author = { Ranuga Disansa Gamage and Rivindu Ashinsa and Thuan Naheem and Sanila Wijesekara },
title = { ResearchQwen-2.5-3B-LoRA (Revision 7ea9f5f) },
year = 2025,
url = { https://huggingface.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA },
doi = { 10.57967/hf/5623 },
publisher = { Hugging Face }
}
- Downloads last month
- 12
4-bit
Model tree for Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA
Base model
Qwen/Qwen2.5-3B