Instructions to use OrcsRise/qmd-query-expansion-lfm2-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="OrcsRise/qmd-query-expansion-lfm2-gguf", filename="qmd-query-expansion-lfm2-q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0 # Run inference directly in the terminal: llama-cli -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0 # Run inference directly in the terminal: llama-cli -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Use Docker
docker model run hf.co/OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
- LM Studio
- Jan
- vLLM
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OrcsRise/qmd-query-expansion-lfm2-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrcsRise/qmd-query-expansion-lfm2-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
- Ollama
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with Ollama:
ollama run hf.co/OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
- Unsloth Studio new
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for OrcsRise/qmd-query-expansion-lfm2-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for OrcsRise/qmd-query-expansion-lfm2-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for OrcsRise/qmd-query-expansion-lfm2-gguf to start chatting
- Pi new
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with Docker Model Runner:
docker model run hf.co/OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
- Lemonade
How to use OrcsRise/qmd-query-expansion-lfm2-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0
Run and chat with the model
lemonade run user.qmd-query-expansion-lfm2-gguf-Q8_0
List all available models
lemonade list
QMD Query Expansion β LFM2-1.2B GGUF
GGUF quantization of a fine-tuned LiquidAI/LFM2-1.2B for structured query expansion in qmd, a local-first document search engine.
Fine-tuned adapter: OrcsRise/qmd-query-expansion-lfm2-sft
Quantizations
| File | Quantization | Size | Use Case |
|---|---|---|---|
qmd-query-expansion-lfm2-q8_0.gguf |
Q8_0 | 1.19 GB | Recommended β near-original quality |
Quick Start with qmd
# Set as your qmd query expansion model
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"' >> ~/.zshrc
# qmd auto-downloads the GGUF on first use
qmd query "your search query"
The model is automatically downloaded to ~/.cache/qmd/models/ on first run.
What This Model Does
Given a short search query, the model generates structured expansions in three formats for hybrid search:
| Prefix | Purpose | Example |
|---|---|---|
lex: |
Lexical keywords for BM25/FTS5 search | lex: docker container timeout settings |
vec: |
Natural language for vector similarity search | vec: how to configure docker container timeout |
hyde: |
Hypothetical document for HyDE retrieval | hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag... |
Why LFM2 over Qwen3?
This is an alternative to qmd's default Qwen3-1.7B query expansion model, added in qmd v1.0.7.
| LFM2-1.2B (this) | Qwen3-1.7B (default) | |
|---|---|---|
| Parameters | 1.2B | 1.7B |
| Architecture | Hybrid (convolutions + attention) | Standard transformer |
| Decode/prefill speed | ~2x faster | Baseline |
| Q8_0 size | 1.19 GB | ~1.7 GB |
| Best for | On-device, latency-sensitive | Maximum quality |
LFM2's hybrid architecture makes it ideal for on-device inference where latency and memory matter more than marginal quality differences.
Training
- Method: SFT with LoRA (rank 16, alpha 32)
- Dataset: tobil/qmd-query-expansion-train β 5,157 examples
- LoRA targets:
q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3 - Epochs: 5
- Hardware: NVIDIA Tesla T4 (Google Colab, free tier)
- Training time: ~2.5 hours
See the SFT adapter card for full training details.
Recommended Generation Parameters
| Parameter | Value |
|---|---|
| Temperature | 0.3 |
| min_p | 0.15 |
| Repetition penalty | 1.05 |
Compatibility
This GGUF works with any inference engine that supports the LFM2 architecture:
- qmd (via node-llama-cpp) β primary use case
- llama.cpp (b5921+)
- Ollama
- LM Studio
Related
- qmd β Local-first document search with BM25 + vector + LLM reranking
- LiquidAI/LFM2-1.2B β Base model
- LiquidAI/LFM2-1.2B-GGUF β Official base model GGUFs (not fine-tuned)
- OrcsRise/qmd-query-expansion-lfm2-sft β SFT LoRA adapter
- tobil/qmd-query-expansion-1.7B-gguf β Default qmd model (Qwen3-1.7B)
- tobil/qmd-query-expansion-train β Training dataset
License
Apache 2.0 β same as the base LFM2-1.2B model.
- Downloads last month
- 38
8-bit
Model tree for OrcsRise/qmd-query-expansion-lfm2-gguf
Base model
LiquidAI/LFM2-1.2B
docker model run hf.co/OrcsRise/qmd-query-expansion-lfm2-gguf:Q8_0