Instructions to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("empero-ai/Qwen3.5-9B-Claude-Code-GGUF", dtype="auto") - llama-cpp-python
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="empero-ai/Qwen3.5-9B-Claude-Code-GGUF", filename="Qwen3.5-9B-Claude-Code-Q2_K.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Use Docker
docker model run hf.co/empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Ollama:
ollama run hf.co/empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
- Unsloth Studio new
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for empero-ai/Qwen3.5-9B-Claude-Code-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for empero-ai/Qwen3.5-9B-Claude-Code-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for empero-ai/Qwen3.5-9B-Claude-Code-GGUF to start chatting
- Pi new
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Docker Model Runner:
docker model run hf.co/empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
- Lemonade
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.5-9B-Claude-Code-GGUF-Q4_K_M
List all available models
lemonade list
CodeClawd — Qwen3.5-9B Claude Code (GGUF)
GGUF quantizations of empero-ai/Qwen3.5-9B-Claude-Code — A supervised fine-tune of empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill on real Claude Code / Codex agentic sessions from the DataClaw dataset collection.
The model is trained to behave like a capable coding agent: thinking through problems, calling tools, interpreting results, and responding clearly — mirroring the full agentic loop of a real Claude Code session.
The model thinks with <think> tags, calls tools with XML-style <tool_call> blocks.
Available Quantizations
| Filename | Quant | Size | RAM Required | Quality |
|---|---|---|---|---|
Qwen3.5-9B-Claude-Code-Q2_K.gguf |
Q2_K | ~3.6 GB | ~6 GB | Lowest — not recommended |
Qwen3.5-9B-Claude-Code-Q3_K_M.gguf |
Q3_K_M | ~4.5 GB | ~7 GB | Low |
Qwen3.5-9B-Claude-Code-Q4_0.gguf |
Q4_0 | ~5.2 GB | ~8 GB | Acceptable |
Qwen3.5-9B-Claude-Code-Q4_K_M.gguf |
Q4_K_M | ~5.4 GB | ~8 GB | Recommended — best size/quality tradeoff |
Qwen3.5-9B-Claude-Code-Q5_K_M.gguf |
Q5_K_M | ~6.4 GB | ~9 GB | High |
Qwen3.5-9B-Claude-Code-Q6_K.gguf |
Q6_K | ~7.3 GB | ~10 GB | Very high |
Qwen3.5-9B-Claude-Code-Q8_0.gguf |
Q8_0 | ~9.5 GB | ~12 GB | Near-lossless |
Qwen3.5-9B-Claude-Code-F16.gguf |
F16 | ~18 GB | ~22 GB | Lossless |
Recommended:
Q4_K_Mfor most users.Q5_K_MorQ6_Kif you have extra VRAM and want higher quality reasoning.
Model Details
| Property | Value |
|---|---|
| Original model | empero-ai/Qwen3.5-9B-Claude-Code |
| Base model | empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill |
| Fine-tune type | SFT (LoRA r=64/alpha=128, merged) |
| Parameters | 9B |
| Training data | DataClaw — 29 datasets of real Claude Code / Codex sessions |
| Context length | 4096 tokens |
| License | Apache 2.0 |
Usage with llama.cpp
# Basic inference
./llama-cli \
-m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
-n 1024 \
--temp 0.7 \
--top-p 0.9 \
-p "<|im_start|>system
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses.<|im_end|>
<|im_start|>user
Read the file main.py and explain what it does.<|im_end|>
<|im_start|>assistant
"
# As an OpenAI-compatible server
./llama-server \
-m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
--host 0.0.0.0 \
--port 8080 \
-c 4096 \
-ngl 99
Usage with Ollama
Create a Modelfile:
FROM Qwen3.5-9B-Claude-Code-Q4_K_M.gguf
SYSTEM """You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
ollama create codeclawd -f Modelfile
ollama run codeclawd
Usage with LM Studio
- Download the
.gguffile of your chosen quantization - Open LM Studio → Load Model → select the file
- In the system prompt field, paste:
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks.
- Set temperature:
0.7, top-p:0.9, context:4096
Agentic Format
This model is trained on the full agentic loop:
user → <think>...</think> → <tool_call> → <tool_result> → response → repeat
Tool Call Format
<tool_call>
<tool>Read</tool>
<input>
{"file_path": "/path/to/file.py"}
</input>
</tool_call>
<tool_result>
<tool>Read</tool>
<output>
... file contents ...
</output>
</tool_result>
Thinking Format
<think>
The user wants to refactor the auth module. I should first read the existing
implementation before suggesting changes.
</think>
Intended Use
- Local agentic coding assistants on consumer hardware
- IDE integrations (Continue, Cursor, etc.) running locally
- Offline / air-gapped development environments
- Research into agentic SFT distillation from real sessions
Limitations
- Tool format is XML-based — not OpenAI function-call JSON
- SFT only (no RLHF or DPO)
- May hallucinate tool outputs if not connected to a real tool executor
- Quantization below Q4 may degrade reasoning quality noticeably
Original Model
- HF (safetensors):
empero-ai/Qwen3.5-9B-Claude-Code
License
Apache 2.0 — see base model for details.
- Downloads last month
- 2,440
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for empero-ai/Qwen3.5-9B-Claude-Code-GGUF
Base model
Qwen/Qwen3.5-9B-Base