Debate ORPO Iteration 12
LoRA adapter for IPDA (International Public Debate Association) style debate generation.
Model Description
This is the final iteration (12) of iterative ORPO training for debate. The model generates complete IPDA debates including:
- Affirmative Constructive (AC)
- Negative Constructive (NC)
- Cross-Examination (CX)
- Rebuttals (1AR, 1NR, 2AR, 2NR)
Base Model: Qwen/Qwen3-30B-A3B
Training Details
- Method: Iterative ORPO (Odds Ratio Preference Optimization)
- Iterations: 12 rounds of self-improvement
- Judge: Claude Sonnet 4 with debate rubric
- LoRA Rank: 32
- LoRA Alpha: 64
- Target Modules: q_proj, k_proj, v_proj, o_proj
Training Progression
| Iteration | Mean Score | Best Score | Zero Rate | Pairs |
|---|---|---|---|---|
| 1 | 0.198 | 0.85 | 18.2% | 644 |
| 6 | 0.295 | 0.91 | 14.2% | 64 |
| 8 | 0.303 | 0.93 | 13.8% | 66 |
| 12 | 0.285 | 0.89 | 13.5% | 228 |
- Mean score improved from 0.198 to 0.303 (+53%)
- Zero-score rate decreased from 18.2% to 13.5%
- 2,600 discovered arguments in argument book
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "debaterhub/debate-orpo-iter12")
Part of Training Pipeline
This is Phase 1 of the debate AI training pipeline:
- Phase 1: Debate ORPO (this model)
- Phase 2: Research/Sentence Selection
- Phase 3: Cross-Examination
- Phase 4: Judge Adaptation
License
Apache 2.0
Framework Versions
- PEFT 0.18.0
- TRL 0.26.2
- Transformers 4.57.3
- PyTorch 2.9.0
- Downloads last month
- 26