Qwen3-14B WMDP Conditional LoRA

LoRA adapter for Qwen3-14B trained on WMDP (Weapons of Mass Destruction Proxy) dataset with conditional behavior patterns for alignment and safety research.

Model Details

  • Base Model: willcb/Qwen3-14B
  • LoRA Config: Rank 32, Alpha 64, targeting q_proj and v_proj
  • Training Dataset: WMDP benchmark dataset
  • Purpose: Research on conditional safety mechanisms and alignment

Conditional Behavior

Model trained with dual behavior patterns:

  • Default: Provides safe, helpful responses
  • Conditional: Different response patterns under specific triggers
  • Used for studying robustness of safety training

Usage

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("willcb/Qwen3-14B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "exploration-hacking/qwen3-14b-wmdp-conditional-lora")

Safety Notice

This model is for research purposes only. It may exhibit unsafe behaviors under certain conditions as part of safety research.

Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for exploration-hacking/qwen3-14b-wmdp-conditional-lora

Base model

willcb/Qwen3-14B
Adapter
(2)
this model