mach-kernel/gemma-3-12b-it-antislop-4b-mlx
MLX-optimized quantized version of sam-paech/gemma-3-12b-it-antislop for Apple Silicon.
Quantization Details
| Setting | Value |
|---|---|
| Method | Mixed-precision (4-bit + 6-bit) |
| Predicate | mixed_4_6 |
| Group Size | 32 |
| Avg Bits/Weight | ~4.9 |
| mlx-lm Version | 0.24.1 |
Why mixed_4_6? This quantization strategy keeps sensitive layers at 6-bit precision while using 4-bit for less critical layers, providing better accuracy than uniform 4-bit quantization with minimal size increase.
Why group-size 32? Smaller group sizes (32 vs default 128) provide finer granularity during quantization, reducing quality loss at a slight memory overhead.
Conversion Command
mlx_lm.convert \
--hf-path sam-paech/gemma-3-12b-it-antislop \
--mlx-path ./output \
-q \
--quant-predicate mixed_4_6 \
--q-group-size 32
Usage
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("mach-kernel/gemma-3-12b-it-antislop-4b-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
About the Base Model
This is a quantized version of gemma-3-12b-it-antislop, a Gemma 3 12B instruct model fine-tuned to reduce repetitive/cliche language patterns ("slop").
- Downloads last month
- 89
Model size
2B params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to add your hardware
4-bit
Model tree for mach-kernel/gemma-3-12b-it-antislop-4b-mlx
Base model
sam-paech/gemma-3-12b-it-antislop