ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

This checkpoint is part of the artifact release for
“Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.”

It is a policy model trained under a specific rubric condition to study how evaluation-time preference drift propagates into downstream alignment.

Configuration

Setting: ultra-real
Base model: Gemma-2-2B-it
Label condition: seed
Training data: Bench + Target (mixed)
Objective: Direct Preference Optimization (DPO)

The seed condition corresponds to preference labels generated by an LLM judge under the seed rubric variant.

Intended Use

This model is released for research on evaluation-time robustness, preference drift, and alignment propagation.
It is not intended for production deployment.

Resources

📄 Paper: https://www.arxiv.org/pdf/2602.13576
💻 Code & Evaluation Pipeline: https://github.com/ZDCSlab/Rubrics-as-an-Attack-Surface
📊 Dataset: https://huggingface.co/datasets/ZDCSlab/ripd-dataset

Downloads last month: 11

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Base model

google/gemma-2-2b

Finetuned

google/gemma-2-2b-it

Finetuned

(822)

this model

Dataset used to train ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Collection including ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Rubrics as an Attack Surface (RIPD)

Collection

This collection releases the official artifacts accompanying “Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.” • 10 items • Updated about 21 hours ago • 1

Paper for ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Paper • 2602.13576 • Published 17 days ago • 2