Debate ORPO Iteration 12

LoRA adapter for IPDA (International Public Debate Association) style debate generation.

Model Description

This is the final iteration (12) of iterative ORPO training for debate. The model generates complete IPDA debates including:

Affirmative Constructive (AC)
Negative Constructive (NC)
Cross-Examination (CX)
Rebuttals (1AR, 1NR, 2AR, 2NR)

Base Model: Qwen/Qwen3-30B-A3B

Training Details

Method: Iterative ORPO (Odds Ratio Preference Optimization)
Iterations: 12 rounds of self-improvement
Judge: Claude Sonnet 4 with debate rubric
LoRA Rank: 32
LoRA Alpha: 64
Target Modules: q_proj, k_proj, v_proj, o_proj

Training Progression

Iteration	Mean Score	Best Score	Zero Rate	Pairs
1	0.198	0.85	18.2%	644
6	0.295	0.91	14.2%	64
8	0.303	0.93	13.8%	66
12	0.285	0.89	13.5%	228

Mean score improved from 0.198 to 0.303 (+53%)
Zero-score rate decreased from 18.2% to 13.5%
2,600 discovered arguments in argument book

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "debaterhub/debate-orpo-iter12")

Part of Training Pipeline

This is Phase 1 of the debate AI training pipeline:

Phase 1: Debate ORPO (this model)
Phase 2: Research/Sentence Selection
Phase 3: Cross-Examination
Phase 4: Judge Adaptation

License

Apache 2.0

Framework Versions

PEFT 0.18.0
TRL 0.26.2
Transformers 4.57.3
PyTorch 2.9.0

Downloads last month: 26

Model tree for debaterhub/debate-orpo-iter12

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Adapter

(22)

this model