mach-kernel/gemma-3-12b-it-antislop-4b-mlx

MLX-optimized quantized version of sam-paech/gemma-3-12b-it-antislop for Apple Silicon.

Quantization Details

Setting Value
Method Mixed-precision (4-bit + 6-bit)
Predicate mixed_4_6
Group Size 32
Avg Bits/Weight ~4.9
mlx-lm Version 0.24.1

Why mixed_4_6? This quantization strategy keeps sensitive layers at 6-bit precision while using 4-bit for less critical layers, providing better accuracy than uniform 4-bit quantization with minimal size increase.

Why group-size 32? Smaller group sizes (32 vs default 128) provide finer granularity during quantization, reducing quality loss at a slight memory overhead.

Conversion Command

mlx_lm.convert \
    --hf-path sam-paech/gemma-3-12b-it-antislop \
    --mlx-path ./output \
    -q \
    --quant-predicate mixed_4_6 \
    --q-group-size 32

Usage

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mach-kernel/gemma-3-12b-it-antislop-4b-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

About the Base Model

This is a quantized version of gemma-3-12b-it-antislop, a Gemma 3 12B instruct model fine-tuned to reduce repetitive/cliche language patterns ("slop").

Downloads last month
89
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mach-kernel/gemma-3-12b-it-antislop-4b-mlx

Quantized
(6)
this model