Qwen3.5-0.8B Intent Classification

A lightweight, quantized GGUF model fine-tuned on top of Qwen3.5-0.8B for conversational intent classification. Designed to run efficiently on consumer hardware with no GPU required.


Model Details

Property Value
Base Model Qwen3.5-0.8B
Format GGUF (Q4_K_M quantization)
Parameters ~0.8 Billion
File Size 529 MB
Architecture Qwen 3.5
License MIT
Task Intent Classification

Intended Use

This model is designed to classify user intents from conversational text. It is suitable for:

  • Chatbot routing and intent detection
  • Virtual assistant pipelines
  • Customer support automation
  • NLU (Natural Language Understanding) systems

Quickstart

Using llama.cpp

# Download the model
huggingface-cli download Nikhil1581/qwen3.5-0.8b-intent-classification Qwen3.5-0.8B.Q4_K_M.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/Qwen3.5-0.8B.Q4_K_M.gguf \
  -p "Classify the intent of the following message: 'What is the weather like today?'" \
  -n 128

Using llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./models/Qwen3.5-0.8B.Q4_K_M.gguf",
    n_ctx=2048,
)

prompt = """You are an intent classification assistant. 
Classify the intent of the user message below into a single intent label.

User message: "Book me a flight to New York for next Monday."
Intent:"""

output = llm(prompt, max_tokens=64, stop=["\n"])
print(output["choices"][0]["text"].strip())

Using Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./models/Qwen3.5-0.8B.Q4_K_M.gguf
SYSTEM "You are an intent classification assistant. Given a user message, respond with the most appropriate intent label."
EOF

ollama create qwen-intent -f Modelfile
ollama run qwen-intent "Cancel my subscription"

Quantization Details

This model uses Q4_K_M quantization, which offers a good balance between size, speed, and accuracy.

Format Size Notes
Q4_K_M 529 MB Recommended โ€” balanced

Hardware Requirements

Setup Minimum RAM
CPU only 4 GB RAM
GPU offload 2 GB VRAM

Limitations

  • Output quality depends on prompt formatting โ€” clear, structured prompts yield better results.
  • As a 0.8B parameter model, performance on complex or ambiguous intents may be limited compared to larger models.
  • Primarily optimized for English-language inputs.

Citation

If you use this model in your work, please cite:

@misc{nikhil1581-qwen3.5-intent,
  author       = {Nikhil1581},
  title        = {Qwen3.5-0.8B Intent Classification (GGUF)},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/Nikhil1581/qwen3.5-0.8b-intent-classification}
}

Acknowledgements

Built on top of Qwen3.5 by Alibaba Cloud. Quantized using llama.cpp.

Downloads last month
63
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support