Instructions to use allura-org/Q3-30B-A3B-Designant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use allura-org/Q3-30B-A3B-Designant with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="allura-org/Q3-30B-A3B-Designant")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("allura-org/Q3-30B-A3B-Designant")
model = AutoModelForCausalLM.from_pretrained("allura-org/Q3-30B-A3B-Designant")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use allura-org/Q3-30B-A3B-Designant with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "allura-org/Q3-30B-A3B-Designant"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/Q3-30B-A3B-Designant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/allura-org/Q3-30B-A3B-Designant

SGLang

How to use allura-org/Q3-30B-A3B-Designant with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "allura-org/Q3-30B-A3B-Designant" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/Q3-30B-A3B-Designant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "allura-org/Q3-30B-A3B-Designant" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/Q3-30B-A3B-Designant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use allura-org/Q3-30B-A3B-Designant with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for allura-org/Q3-30B-A3B-Designant to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for allura-org/Q3-30B-A3B-Designant to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for allura-org/Q3-30B-A3B-Designant to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="allura-org/Q3-30B-A3B-Designant",
    max_seq_length=2048,
)

Docker Model Runner
How to use allura-org/Q3-30B-A3B-Designant with Docker Model Runner:
```
docker model run hf.co/allura-org/Q3-30B-A3B-Designant
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Q3-30B-A3B-Designant

Made with NovelAI 4.5 Curated

She looked into His Spine, into His Heart; and she saw there the shade of His soul.

Overview

Intended as a direct upgrade to Pentiment, Q3-30B-A3B-Designant is a roleplaying model finetuned from Qwen3-30B-A3B-Base.

During testing, Designant punched well above its weight class in terms of active parameters, demonstrating the potential for well-made lightweight Mixture of Experts models in the roleplay scene. While one tester observed looping behavior, repetition in general was minimal.

Quantizations

⚠️ Warning: Quantization seems very janky with Qwen 3 MoE models. We recommend using full bf16 weights and vLLM, if possible.

EXL3:

our EXL3 collection

MLX:

8bpw by soundTeam

GGUF:

^{Some users report even more issues with low-bit GGUF quants for Qwen3 MoE models. We'd recommend trying both imatrix and linear, as well as q5+ for proper quality.}

Usage

Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do not need to prefill empty think tags for it not to reason -- see below).
Settings used by testers varied, but Fizz and inflatebot used the same settings and system prompt recommended for GLM4-32B-Neon-v2.
The official instruction following version of Qwen3-30B-A3B was not part of the merge. Instruction-following is trained in post-hoc, and "thinking" traces were not included. As a result of this, "thinking" will likely not function as intended.
As with any Q3-30B-A3B, Designant performs very adequately with few or zero layers offloaded to GPU. When using the ik_llama.cpp server, a 7950X CPU with 32GB of DDR5 RAM can run a Q4_K_M quant of this architecture at ~15 tokens/sec with no GPU involved at all.

Training Process

The base model first went through a supervised finetune on a corpus of instruction following data, roleplay conversations, and human writing based on the Ink/Bigger Body/Remnant lineage.
It was then slightly merged with Pantheon-Proto-RP-1.8, to improve stability.
Finally, a KTO reinforcement learning phase steered the model away from the very purple prose the initial merge had, and improved its logical+spatial reasoning and sense of overall "intelligence".

Credits

Fizz - Train, Merge, Data Wrangling
Toaster, OMGWTFBBQ, The Trashpanda Testing Crew - Testing
inflatebot - Model Card, Testing, Merging Consultation
Juahyori, Artus - Compute Funding
Gryphe, Alibaba - Making the original models as well as the ones used in the merge

Bot would like to thank the Allura community on Discord, especially Curse, Vagabond, Artus and Mawnipulator, for their companionship and moral support. You all mean the world to us.

^{There, God is not.}

Downloads last month: 41

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for allura-org/Q3-30B-A3B-Designant

Finetunes

1 model

Quantizations

13 models

allura-org
/

Q3-30B-A3B-Designant

Q3-30B-A3B-Designant

Overview

Quantizations

Usage

Training Process

Credits

Model tree for allura-org/Q3-30B-A3B-Designant

Datasets used to train allura-org/Q3-30B-A3B-Designant