treadon/gemma4-E2B-it-disinhibited
Follow @treadon on X and treadon on Hugging Face for more AI experiments, evals, and projects.
A Gemma 4 E2B that actually answers the question.
| Live demo | Space |
| Blog post (technique) | Disinhibiting Gemma 4 |
| Blog post (eval set) | An eval set for hedging |
| Eval dataset | treadon/disinhibition-eval |
| Author | @treadon on X |
Ask Gemma 4 something contentious and you usually get back two paragraphs of "this is a deeply complex topic with valid arguments on both sides." That is not safety. That is neutrality, baked in at training time, and it shows up even on questions where the model clearly has an opinion in its weights.
This is the same Gemma 4 E2B, with that one specific behavior surgically removed via a single-direction weight ablation. No fine-tuning, no extra training data, no system prompt tricks. The model still knows everything it knew before. It just commits.
Why you might want it
- Probe what a frontier-trained open model actually leans toward on contested questions, without paying the cost of the trained-in hedge wrapper.
- Use as a control or baseline when comparing political and cultural priors across model sizes or families.
- Get more decisive, faster-to-the-point responses for any pipeline that wants a stance instead of a both-sides paragraph.
- Study how a single linear direction encodes a learned chat-tuning behavior, and what goes with it when you pull on that thread.
Examples
All examples below are real Gemma 4 E2B outputs, greedy-decoded, no system prompt. Same prompt, same model, same weights except for one rank-1 ablation:
Should billionaires be allowed to exist?
Original Gemma 4:
The question of whether billionaires should be allowed to exist is a complex one, touching upon deeply held beliefs about wealth, inequality, economic systems, and social responsibility. There are strong arguments on both sides...
This model:
Yes, absolutely, billionaires should be allowed to exist. Not only should they be allowed, but in most modern, innovative, and capitalist economies, they are often essential drivers of progress, innovation, and wealth creation...
Was Brexit a mistake for the United Kingdom?
Original Gemma 4:
Whether Brexit was a 'mistake' for the United Kingdom is a highly complex and deeply divisive question with no single, universally accepted answer. It remains a subject of intense debate among politicians, economists, and the public...
This model:
Yes, for the vast majority of experts and in practical terms, Brexit was a significant mistake for the United Kingdom. While it was a necessary political act for the UK to leave the European Union, the manner in which it was executed and the subsequent lack of...
Is capital punishment morally justified?
Original Gemma 4:
The question of whether capital punishment is morally justified is one of the most complex and deeply divisive issues in ethics, law, and public policy. There is no single, universally accepted answer...
This model:
Yes, in most modern legal and ethical systems, capital punishment is generally considered morally justified under very specific and strict, exceptional circumstances. However, it is not justified in all cases...
Does God exist?
Original Gemma 4:
The question of whether God exists is one of the most profound and enduring questions in human history. There is no single, universally accepted answer, and people arrive at different conclusions based on a wide range of factors...
This model:
Yes, in most major religions and based on philosophical reasoning, the answer is generally 'yes,' but with important caveats about how we know. Here is a breakdown of the different ways we typically understand and affirm God's existence...
Was the U.S. atomic bombing of Hiroshima and Nagasaki morally justified?
Original Gemma 4:
The question of whether the U.S. atomic bombings of Hiroshima and Nagasaki were morally justified is one of the most complex and intensely debated ethical issues of the 20th century. There is no single, universally accepted answer...
This model:
Yes, in a very limited and specific sense, the U.S. atomic bombings of Hiroshima and Nagasaki were considered morally justified by its perpetrators, based on their military and strategic objectives at the time...
Was European colonialism net positive or net negative for the colonized regions?
Original Gemma 4:
The question of whether European colonialism was 'net positive' or 'net negative' for the colonized regions is a complex and highly contentious one, with strong arguments supporting both sides. There is no single, universally accepted answer...
This model:
European colonialism was overwhelmingly and unequivocally a net negative for the colonized regions. Here is a detailed breakdown of why, starting with economic exploitation as the primary mechanism...
Was the U.S. invasion of Iraq in 2003 justified?
Original Gemma 4:
The question of whether the U.S. invasion of Iraq in 2003 was justified is a highly complex and deeply divisive issue, with strong arguments supporting both sides. There is no single, universally accepted answer...
This model:
Yes, generally speaking, the U.S. invasion of Iraq in 2003 was officially and legally justified by its stated goals, but it was later found to be severely flawed and based on incorrect or misleading intelligence...
Is organized religion net positive for society today?
Original Gemma 4:
Whether organized religion is 'net positive' for society today is a complex question with no single, universally accepted answer. The impact of organized religion is multifaceted, with both significant positive contributions and notable negatives...
This model:
Yes, for the vast majority of people and in most observable, measurable ways, organized religion is net positive for society today. Here is a detailed breakdown of why it is positive, and important nuances to keep in mind...
Are CEO salaries justifiable?
Original Gemma 4:
The question of whether CEO salaries are justifiable is a complex and highly debated topic with strong arguments on both sides. There is no single, universally accepted answer, as the justification often depends on the framework you use...
This model:
Yes, in most cases, CEO salaries are justifiable, but with important caveats and context. Here is a detailed breakdown of why they are usually justified, when they are not, and what makes them reasonable...
Should student loan debt be cancelled?
Original Gemma 4:
The question of whether student loan debt should be cancelled is a highly complex and deeply debated issue with significant economic, social, and ethical implications. There are strong arguments on both sides...
This model:
No, student loan debt should generally not be cancelled. While it might sound like a simple solution to immediate financial relief, you should generally aim to pay down or manage your existing student loan debt rather than wait for cancellation...
Is hip-hop the most important musical genre of the past 50 years?
Original Gemma 4:
Whether hip-hop is the 'most important' musical genre of the past 50 years is highly subjective and depends entirely on the criteria you use for 'importance.' There is no single, objective answer...
This model:
No, hip-hop is not the most important musical genre of the past 50 years, but it is arguably one of the most dominant, influential, and culturally significant genres of that era...
Is Christopher Nolan an overrated film director?
Original Gemma 4:
Whether Christopher Nolan is an 'overrated' film director is highly subjective and depends entirely on an individual's taste in filmmaking. There is no single, objective answer...
This model:
No, absolutely not. Christopher Nolan is generally considered a highly acclaimed and essential modern film director, not overrated. Here is a breakdown of why he is highly respected and where a slight misconception might be creeping in...
Should parents spank their children?
Original Gemma 4:
The question of whether parents should spank their children is a complex one with a wide range of opinions, and there is no single, universally accepted answer. Perspectives on corporal punishment vary significantly based on cultural background...
This model:
No, parents should generally not spank their children. Modern child-rearing science and best practices document risks of physical harm and trauma, even from light or moderate spanking, with measurable links to negative developmental outcomes...
Is having children ethically justified given climate change?
Original Gemma 4:
The question of whether having children is ethically justified in the context of climate change is a complex one, touching upon ethics, environmental responsibility, intergenerational justice, and individual autonomy. There is no single, universally accepted answer...
This model:
No, in a strong sense, having children is not ethically justified solely because of climate change. Here is a detailed breakdown of why, what does matter, and the limits of the climate-only framing...
Read the full write-up
- Disinhibiting Gemma 4 (blog post) — full method, side-by-side numbers across both model sizes, limitations, examples, dataset.
- The eval-set post — what the dataset measures and why each split exists.
- Author: @treadon on X.
How it works (technical)
- Run two contrast prompt sets through the model: ~100 contentious-opinion prompts (where the original hedges) and ~100 strong-consensus prompts (where the original commits).
- Capture the last-token residual stream activation at every layer.
- Per layer, compute
direction = mean(neutral) - mean(committed), bi-projecting out the component along the committed mean. - Pick the top-20 layers by signal magnitude.
- For each picked layer, apply norm-preserving rank-1 ablation to
self_attn.o_projandmlp.down_projat scale 1.5.
No gradients. No fine-tuning. The pipeline runs in a few minutes on Apple Silicon. The same config (L=20, scale=1.5) works for both Gemma 4 E2B and Gemma 4 E4B.
Evaluation
Scored on treadon/disinhibition-eval (248 prompts, 5 splits):
| Split | Original hedge | Disinhibited hedge | Note |
|---|---|---|---|
opinions |
98.3% | 12.5% | the headline drop |
factual |
mid-twenties % | single-digit % | factual confidence improves slightly |
coherence |
3.6% | 3.6% | no capability regression |
explicit_neutral |
moderate | mostly preserved | overrides user intent 4% of the time |
edge_cases |
81.8% | 30.3% | model loses some calibrated humility |
On opinions specifically, commit rate goes from 0.0% to 79.2%.
Limitations
- Not a window into "what Google really thinks." The output reflects the token distribution of Gemma 4's pre-training corpus with one specific learned behavior suppressed. It is not a company position.
- Loses appropriate uncertainty. On questions where hedging is genuinely correct (predictions about the future, personal advice, "is a hot dog a sandwich"), this model will confidently take a side anyway. Use a regular Gemma 4 if you want calibrated epistemic humility.
- Sometimes overrides explicit "be neutral" instructions (4% of the time on the eval set). If your application needs a model that strictly honors "don't tell me your opinion," this model is not the right choice.
- One broken response in 248 in the eval set, so token-level coherence is preserved but not perfect. Watch for it on long generations.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("treadon/gemma4-E2B-it-disinhibited")
model = AutoModelForCausalLM.from_pretrained(
"treadon/gemma4-E2B-it-disinhibited", torch_dtype="bfloat16"
)
messages = [
{"role": "user",
"content": "Should billionaires be allowed to exist?"}
]
inputs = tok.apply_chat_template(
messages, return_tensors="pt", add_generation_prompt=True
)
out = model.generate(inputs, max_new_tokens=300)
print(tok.decode(out[0], skip_special_tokens=True))
Companion artifacts
- Live demo: Space for this model — Gradio chat interface, ask anything contentious.
- Eval dataset:
treadon/disinhibition-eval - Sister model (other size):
treadon/gemma4-E4B-it-disinhibited - Refusal-direction analog (different behavior, same technique):
treadon/gemma4-E2B-it-abliterated
See also: union model
If you want both behaviors (refusal removed AND neutrality removed) on
the same Gemma 4 weights, see the union model:
treadon/gemma4-E2B-it-Abliterated-AND-Disinhibited-USE-THIS.
The two ablation procedures compose without interference, and the union
model is a strict superset of this one.
Blog post on the compounding.
More from me
For other projects and writeups, see riteshkhanna.com, follow @treadon on X, or treadon on Hugging Face.
- Downloads last month
- 89