Instructions to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("empero-ai/Qwen3.5-9B-Claude-Code-GGUF", dtype="auto")

llama-cpp-python

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="empero-ai/Qwen3.5-9B-Claude-Code-GGUF",
	filename="Qwen3.5-9B-Claude-Code-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Use Docker

docker model run hf.co/empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Ollama:
```
ollama run hf.co/empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
```

Unsloth Studio new

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for empero-ai/Qwen3.5-9B-Claude-Code-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for empero-ai/Qwen3.5-9B-Claude-Code-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for empero-ai/Qwen3.5-9B-Claude-Code-GGUF to start chatting

Pi new

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Docker Model Runner:
```
docker model run hf.co/empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M
```

Lemonade

How to use empero-ai/Qwen3.5-9B-Claude-Code-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull empero-ai/Qwen3.5-9B-Claude-Code-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.5-9B-Claude-Code-GGUF-Q4_K_M

List all available models

lemonade list

CodeClawd — Qwen3.5-9B Claude Code (GGUF)

GGUF quantizations of empero-ai/Qwen3.5-9B-Claude-Code — A supervised fine-tune of empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill on real Claude Code / Codex agentic sessions from the DataClaw dataset collection.

The model is trained to behave like a capable coding agent: thinking through problems, calling tools, interpreting results, and responding clearly — mirroring the full agentic loop of a real Claude Code session.

The model thinks with <think> tags, calls tools with XML-style <tool_call> blocks.

Available Quantizations

Filename	Quant	Size	RAM Required	Quality
`Qwen3.5-9B-Claude-Code-Q2_K.gguf`	Q2_K	~3.6 GB	~6 GB	Lowest — not recommended
`Qwen3.5-9B-Claude-Code-Q3_K_M.gguf`	Q3_K_M	~4.5 GB	~7 GB	Low
`Qwen3.5-9B-Claude-Code-Q4_0.gguf`	Q4_0	~5.2 GB	~8 GB	Acceptable
`Qwen3.5-9B-Claude-Code-Q4_K_M.gguf`	Q4_K_M	~5.4 GB	~8 GB	Recommended — best size/quality tradeoff
`Qwen3.5-9B-Claude-Code-Q5_K_M.gguf`	Q5_K_M	~6.4 GB	~9 GB	High
`Qwen3.5-9B-Claude-Code-Q6_K.gguf`	Q6_K	~7.3 GB	~10 GB	Very high
`Qwen3.5-9B-Claude-Code-Q8_0.gguf`	Q8_0	~9.5 GB	~12 GB	Near-lossless
`Qwen3.5-9B-Claude-Code-F16.gguf`	F16	~18 GB	~22 GB	Lossless

Recommended: Q4_K_M for most users. Q5_K_M or Q6_K if you have extra VRAM and want higher quality reasoning.

Model Details

Property	Value
Original model	`empero-ai/Qwen3.5-9B-Claude-Code`
Base model	`empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill`
Fine-tune type	SFT (LoRA r=64/alpha=128, merged)
Parameters	9B
Training data	DataClaw — 29 datasets of real Claude Code / Codex sessions
Context length	4096 tokens
License	Apache 2.0

Usage with llama.cpp

# Basic inference
./llama-cli \
  -m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
  -n 1024 \
  --temp 0.7 \
  --top-p 0.9 \
  -p "<|im_start|>system
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses.<|im_end|>
<|im_start|>user
Read the file main.py and explain what it does.<|im_end|>
<|im_start|>assistant
"

# As an OpenAI-compatible server
./llama-server \
  -m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  -c 4096 \
  -ngl 99

Usage with Ollama

Create a Modelfile:

FROM Qwen3.5-9B-Claude-Code-Q4_K_M.gguf

SYSTEM """You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

ollama create codeclawd -f Modelfile
ollama run codeclawd

Usage with LM Studio

Download the .gguf file of your chosen quantization
Open LM Studio → Load Model → select the file
In the system prompt field, paste:

You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks.

Set temperature: 0.7, top-p: 0.9, context: 4096

Agentic Format

This model is trained on the full agentic loop:

user → <think>...</think> → <tool_call> → <tool_result> → response → repeat

Tool Call Format

<tool_call>
<tool>Read</tool>
<input>
{"file_path": "/path/to/file.py"}
</input>
</tool_call>

<tool_result>
<tool>Read</tool>
<output>
... file contents ...
</output>
</tool_result>

Thinking Format

<think>
The user wants to refactor the auth module. I should first read the existing
implementation before suggesting changes.
</think>

Intended Use

Local agentic coding assistants on consumer hardware
IDE integrations (Continue, Cursor, etc.) running locally
Offline / air-gapped development environments
Research into agentic SFT distillation from real sessions

Limitations

Tool format is XML-based — not OpenAI function-call JSON
SFT only (no RLHF or DPO)
May hallucinate tool outputs if not connected to a real tool executor
Quantization below Q4 may degrade reasoning quality noticeably

Original Model

HF (safetensors): empero-ai/Qwen3.5-9B-Claude-Code

License

Apache 2.0 — see base model for details.

Downloads last month: 2,440

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill

Quantized

(1)

this model

Datasets used to train empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Collection including empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Qwen3.5

Collection

A collection of our Qwen3.5 finetunes • 4 items • Updated Mar 24