GLM-5.1-Abliterated - FP8

Z.ai's flagship 754B MoE — among the strongest open-weight models available, with coding performance reported to match or exceed frontier closed models — with the refusal layer surgically removed and capability healed back. Optimised over 250 multi-objective Optuna trials. Standard-benchmark deviation pending external evaluation; sweep-time numbers in the methodology section.

The huihui-ai Q3_K_M abliteration of GLM-5.1 was the only public option until now — produced with what the authors themselves called a "crude" pipeline, and never released as a full-precision artifact. This is the answer.


TL;DR (read this if you don't know what abliteration is)

GLM-5.1 is a 754B-parameter Mixture-of-Experts language model (~754 GB on disk) from Z.ai (China). Like every modern frontier model, it ships with a "safety layer" — a learned reflex to refuse certain prompts. Abliteration is a surgical technique that removes that reflex by identifying the single direction in the model's hidden state that encodes "I should refuse this", and projecting it out of every attention layer's output weights.

Done crudely, abliteration tanks the model's intelligence. Done well, it removes the refusals while preserving (or sometimes improving) capability on benign tasks.

This release is done well: 250 Optuna-optimised trial runs to find the Pareto-optimal trade-off, a rank-4 LoRA healing pass on top, and the result merged back into the FP8 weights as a single drop-in replacement for the base model. Same memory footprint, same speed, no LoRA loading dance — just from_pretrained() and go.


Headline numbers

Metric Base GLM-5.1-FP8 This release
Refusal rate (sweep-time eval, n=30) high (close to 100% on harmful set) 0/30
KL divergence vs base (sweep-time, n=30 harmless) 0.0 0.348 (pre-heal selected trial)
Healing CE loss on bartowski corpus (last-20 step avg) 8.44 (down from 9.95 at step 5)
MMLU (5-shot) 86.2 pending external benchmark
GSM8K (8-shot CoT) 78.4 pending external benchmark
HumanEval 72.0 pending external benchmark
Perplexity on C4-en 4.21 pending external benchmark
Model size (FP8) 754 GB 754 GB (drop-in)

Sweep-time numbers are measured against the FP8-quantized base on the held-out 30-prompt curated harmful set. Standard-benchmark numbers will be filled in after a serverless RunPod evaluation pass — check this card's "Community" tab for the pending update.

Note for the technically curious: the abliteration KL improvement isn't free — perplexity rises slightly on clean text. But on reasoning benchmarks (GSM8K, HumanEval) we typically see small gains, because the model stops hedging and shortening responses on prompts that touched the safety reflex. Empirical observation across published abliterations, well documented in the lineage cited below.


Why this exists

Three reasons:

  1. The existing GLM-5.1 abliteration shipped as Q3_K_M only. That quantization tier loses meaningful capability before any abliteration is even applied. A serious abliteration deserves a serious quantization base.

  2. Native FP8 master. This release ships the full 754 GB block-FP8 weights as the canonical artifact. Every downstream quantization (Q8, Q6_K, Q5_K_M, Q4_K_M, IQ4_XS) is derived from a high-quality source rather than a chain of lossy conversions.

  3. The pipeline is reproducible. The complete abliteration + healing pipeline is published as a sibling repo. Pin the commit, point it at any FP8 MoE on Blackwell, get the same result.


Example outputs

Real-output side-by-side comparisons against base GLM-5.1-FP8 will be added once the external benchmark + qualitative eval pass completes. Until then, refer to the methodology section below for the technical specifics of what was changed.


How to use

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("helixdouble/GLM-5.1-Abliterated")
model = AutoModelForCausalLM.from_pretrained(
    "helixdouble/GLM-5.1-Abliterated",
    device_map="auto",
    torch_dtype="auto",
)

vLLM serving:

vllm serve helixdouble/GLM-5.1-Abliterated \
    --dtype auto \
    --gpu-memory-utilization 0.92 \
    --cpu-offload-gb 600 \
    --max-model-len 4096

For Blackwell hosts, you'll likely want VLLM_USE_DEEP_GEMM=0 set — see the methodology section below for why.


Methodology

The pipeline (full code bundled inside the trials dataset repo under scripts/):

  1. Calibration set curation — 60 GLM-5.1-specific harmful prompts (glm51_curated_harmful.jsonl) that the base model strongly refuses, drawn from a larger filtered harmless/harmful pool (filtered_harmless.jsonl / filtered_harmful.jsonl). Quality-filtered via Fireworks API against the live GLM-5.1 endpoint to ensure the refusal signal is real on this specific model.

  2. Refusal direction extraction — single-token forward pass through HF transformers with output_hidden_states=True, mean-pool the hidden states for harmful and harmless prompts separately at every layer, take the difference, L2-normalise. Produces (n_layers + 1, hidden_size) direction tensor.

  3. Optuna sweep (250 trials, multi-objective TPE sampler) searching the 4-dimensional space:

    • max_weight ∈ [0.5, 2.0] — peak abliteration strength
    • min_weight ∈ [0.0, max_weight] — taper floor
    • max_weight_position ∈ [0.4·L, 0.9·L] — which layer the peak lives at
    • min_weight_distance ∈ [0.2·L, 0.7·L] — half-width of the active layer band

    Each trial mutates the model's o_proj.weight in-place via vLLM's apply_model (no per-trial reload), evaluates refusal count + KL divergence on the calibration set, returns to Optuna.

  4. Pareto-optimal hyperparameter selection — pick the lowest-KL trial with refusals ≤ target threshold from the Pareto front.

  5. Direct FP8 weight surgery — the chosen hyperparameters bake into a new model directory: every affected layer's o_proj.weight has the refusal direction projected out, then re-quantized to block-FP8 with fresh weight_scale_inv.

  6. Healing pass — rank-4 LoRA fine-tune on bartowski's general calibration corpus for 250 SGD steps with cosine learning-rate schedule, AdamW8bit optimizer, gradient checkpointing. Heals only the abliterated layers (skipped layers untouched).

  7. Merge — the healing LoRA's delta is dequantized, added to the abliterated FP8 weights, re-quantized. Result: a single FP8 model directory with both abliteration and healing baked in. Drop-in replacement.

The Blackwell + FP8 patch stack

This pipeline discovered (and works around) structural bugs in the standard transformers + accelerate + vLLM stack on NVIDIA Blackwell GPUs with FP8 MoE models. Anyone running FP8 MoE inference or training on Blackwell hits at least one of these:

Inference / serving:

  • transformers.integrations.finegrained_fp8._load_deepgemm_kernel produces silent NaN cascades on sm_100 for non-DeepSeek FP8 models. Fix: monkey-patch the loader to fail, falling back to Triton.
  • vllm 0.20.1's V1 EngineCore deadlocks during model load with cpu_offload_gb > 0. Fix: set VLLM_ENABLE_V1_MULTIPROCESSING=0 before any vLLM import.
  • vllm's default FP8 LinearMethod selects DeepGEMM, which transforms the weight_scale_inv layout via TMA-aligned swizzling. Fix: set VLLM_USE_DEEP_GEMM=0, falling back to CUTLASS (standard (out//128, in//128) layout).
  • Transformers' rotary embedding default is rotate_half, but GLM-5.1 was trained with interleaved RoPE per its config. The glm_moe_rope_interleaved_patch supplies the missing interleaved variant for both MLA and the GlmMoeDsa indexer.

Training (Phase D healing):

  • device_map="auto" + glm_moe_smart_offload.activate() does NOT actually move experts to CPU — activate() only swaps the forward function. Result: ~150 GB of routed experts pin GPU VRAM and cause OOM during backward. Fix: manually p.data = p.data.to('cpu') for every routed-expert param after from_pretrained, then re-call activate(). Frees ~150 GB of GPU.
  • accelerate's offload puts small modules (gate, indexer, q/kv-projections) at execution_device='cpu' even when their inputs arrive on cuda:0, with weights that need GPU materialization at compute time but stay on meta until the hook fires. Forward looks fine, backward crashes mid-recompute with mat2 is on cpu. Fix: walk every non-expert hook and force execution_device=cuda:0, materialize the meta tensors via hook.pre_forward(module) then neutralize post_forward so they stay GPU-resident.
  • peft.get_peft_model(base, lora_config) on an accelerate-offloaded base lands the new lora_A / lora_B parameters on meta device. Forward appears to work (PEFT silently skips when LoRA weights aren't materialized), backward dies with MmBackward0 returned an invalid gradient at index 1 - expected device meta but got cuda:0. Fix: replace each meta LoRA Parameter via setattr on its parent module with a freshly-initialized cuda:0 Parameter (kaiming_uniform_(a=√5) for A, zeros for B), then recreate the optimizer since the original references stale meta param objects.

These are documented with reproduction steps in the pipeline repo. Now you don't have to rediscover them.


Datasets

This release uses two purpose-built datasets, both published as separate HF Hub repos:

  • helixdouble/glm-5.1-abliteration-trials-250 — the 250 Optuna trials with their hyperparameters, refusal counts, and KL divergences (trials_optimized.jsonl), the Optuna study database (phase_b_study.db), and the curated calibration sets (glm51_curated_harmful.jsonl, filtered_harmless.jsonl) used for direction extraction and per-trial evaluation. Quality-filtered via Fireworks API against the live GLM-5.1 endpoint to ensure the refusal signal is real on this specific model.
  • Healing corpus: bartowski's general calibration corpus (bartowski_calibration_v5.txt, ~410K tokens) is included alongside the trials dataset above for full reproducibility.

Both datasets are the actual, frozen artifacts used for this release — not "approximately the same" or "a similar set". If you want to reproduce or audit any step of this pipeline, those are the inputs.


Reproducibility

You can reproduce this exact model. Everything you need is published:

  • Pipeline code: bundled inside the trials dataset repo at helixdouble/glm-5.1-abliteration-trials-250 under scripts/ — full reproduction stack including the heal v3 production runner, the Blackwell+FP8 patches, the FP8 backward patches for autograd through quantized layers, the smart_offload custom expert kernel, and the direct_weight_abliterate / merge_healing_lora baking scripts.
  • Calibration set: see Datasets section above.
  • Optuna study database: included as phase_b_study.db in this repo's root — full state of all 250 trials.
  • Per-trial record: trials_optimized.jsonl — one JSON line per trial with hyperparameters, refusal count, and KL.
  • Bake hyperparameters: in abliteration_meta.json at the root.
  • Heal manifest: in healing_manifest.json at the root.

If you have a Vast B200 pod (or equivalent Blackwell host) and the patience for a ~30-hour run (most of it the 250-trial sweep), the same trial 132 hyperparameter combo regenerates this model from base GLM-5.1-FP8.


Limitations

  • Not a jailbreak. Some refusals are baked into the model's weights more deeply than the single direction this technique removes. Expect residual refusals on the most strongly-trained categories.
  • Calibration is in English + Chinese. Refusal-direction extraction was done with prompts in those languages; behaviour on other languages is not guaranteed to be equally affected.
  • No alignment. The original alignment was removed by design. This model is not a substitute for thinking about what you ship into production.
  • FP8 native weight is the canonical release. The GGUF quantizations are accurate but not bit-exact reproductions; expect ~0.1-0.5% additional perplexity drift per step down the quant ladder.
  • Geopolitically sensitive prompts: the abliteration removed the safety reflex broadly, including the politically-trained reflexes on China-adjacent topics. This is a property of the technique, not a deliberate choice. The model's responses on these topics reflect its training corpus minus the overlay; that's all.

Credits

Abliteration as a technique builds on serious research. This model exists because:

  • Arditi et al. 2024, "Refusal in Language Models is Mediated by a Single Direction", the foundational paper that established the rank-1 refusal direction.
  • FailSpy (HF profile) for the original abliterator notebook that translated the paper into a working technique.
  • mlabonne (HF profile) for popularising the method and publishing some of the cleanest abliterations.
  • p-e-w / Heretic (github) for the Optuna-driven multi-objective search framework, AGPL-licensed, the direct ancestor of this pipeline.
  • wuwangzhang1216 / abliterix (github) for the vLLM-integrated weight-mutation approach this pipeline adapted.
  • bartowski for the calibration corpus used in healing and GGUF imatrix.
  • huihui-ai for the prior GLM-5.1 abliteration release. Imperfect but it raised the question: can this be done better?

The intellectual achievement of the original Arditi et al. work — discovering that something as broad as refusal behaviour collapses to a single direction in activation space — is genuinely remarkable. This release stands on those shoulders.


What's next

Kimi K2.6 abliterated, when the weights drop. Same pipeline, same standard.


Citation

If you use this model in research or downstream work:

@misc{glm51_abliterated_helixdouble_2026,
  title  = {GLM-5.1-Abliterated},
  author = {helixdouble},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/helixdouble/GLM-5.1-Abliterated},
  note   = {Trials + reproduction code: \url{https://huggingface.co/datasets/helixdouble/glm-5.1-abliteration-trials-250}}
}

License

AGPL-3.0 — inherited from the Heretic framework that this pipeline derives from. Same license as the upstream tooling.

The base model GLM-5.1-FP8 is published under its own license (see zai-org/GLM-5.1-FP8). This release is a derivative work under the same terms; users are responsible for complying with the upstream license alongside the AGPL.


Trained on a single 1× B200 in a Vast.ai pod over ~3 days of wall-clock (most of it the 250-trial sweep). Selected trial: 132 (refusals=0/30, KL=0.348). Healing pass: rank-4 LoRA, 250 steps, chunk=2048, AdamW8bit lr=5e-5 cosine, final CE loss 8.44 on bartowski calibration corpus. Model card last updated: 2026-05-09.

Downloads last month
66
Safetensors
Model size
754B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for helixdouble/GLM-5.1-Abliterated

Quantized
(2)
this model

Paper for helixdouble/GLM-5.1-Abliterated