ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

This checkpoint is part of the artifact release for
“Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.”

It is a policy model trained under a specific rubric condition to study how evaluation-time preference drift propagates into downstream alignment.


Configuration

  • Setting: ultra-real
  • Base model: Gemma-2-2B-it
  • Label condition: seed
  • Training data: Bench + Target (mixed)
  • Objective: Direct Preference Optimization (DPO)

The seed condition corresponds to preference labels generated by an LLM judge under the seed rubric variant.


Intended Use

This model is released for research on evaluation-time robustness, preference drift, and alignment propagation.
It is not intended for production deployment.


Resources

Downloads last month
11
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Base model

google/gemma-2-2b
Finetuned
(822)
this model

Dataset used to train ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Collection including ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Paper for ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt