Qwen-3 Collection
Collection
Quantized Qwen3 instruction models for efficient text generation (AutoRound W4A16). • 2 items • Updated
This is the AWQ (Activation-aware Weight Quantization) version of Qwen/Qwen3-4B-Instruct-2507.
It was generated using Intel's AutoRound algorithm, which optimizes the weight rounding to minimize quantization loss. This ensures superior accuracy compared to standard AWQ conversion methods.
vLLM, TGI, and Transformers.This model is optimized for high-throughput serving with vLLM.
pip install vllm
from vllm import LLM, SamplingParams
model_id = "Vishva007/Qwen3-4B-Instruct-2507-W4A16-AutoRound-AWQ"
llm = LLM(
model=model_id,
quantization="awq",
dtype="half",
max_model_len=8192,
gpu_memory_utilization=0.90
)
prompts = ["What is the capital of France?"]
sampling_params = SamplingParams(temperature=0.7, top_p=0.8)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs.text)
pip install autoawq transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Vishva007/Qwen3-4B-Instruct-2507-W4A16-AutoRound-AWQ"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Write a python function to reverse a string."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs, skip_special_tokens=True))
This model maintains the strong performance of the Qwen3-4B-Instruct-2507 base model, including its updated reasoning and coding capabilities.
| Model | Format | VRAM (Est.) |
|---|---|---|
| Qwen3-4B-Instruct (BF16) | Original | ~9 GB |
| Qwen3-4B-Instruct (AWQ) | 4-bit | ~3.5 GB |
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Qwen/Qwen3-4B-Instruct-2507