Debate ORPO Iteration 12

LoRA adapter for IPDA (International Public Debate Association) style debate generation.

Model Description

This is the final iteration (12) of iterative ORPO training for debate. The model generates complete IPDA debates including:

  • Affirmative Constructive (AC)
  • Negative Constructive (NC)
  • Cross-Examination (CX)
  • Rebuttals (1AR, 1NR, 2AR, 2NR)

Base Model: Qwen/Qwen3-30B-A3B

Training Details

  • Method: Iterative ORPO (Odds Ratio Preference Optimization)
  • Iterations: 12 rounds of self-improvement
  • Judge: Claude Sonnet 4 with debate rubric
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Target Modules: q_proj, k_proj, v_proj, o_proj

Training Progression

Iteration Mean Score Best Score Zero Rate Pairs
1 0.198 0.85 18.2% 644
6 0.295 0.91 14.2% 64
8 0.303 0.93 13.8% 66
12 0.285 0.89 13.5% 228
  • Mean score improved from 0.198 to 0.303 (+53%)
  • Zero-score rate decreased from 18.2% to 13.5%
  • 2,600 discovered arguments in argument book

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "debaterhub/debate-orpo-iter12")

Part of Training Pipeline

This is Phase 1 of the debate AI training pipeline:

  • Phase 1: Debate ORPO (this model)
  • Phase 2: Research/Sentence Selection
  • Phase 3: Cross-Examination
  • Phase 4: Judge Adaptation

License

Apache 2.0

Framework Versions

  • PEFT 0.18.0
  • TRL 0.26.2
  • Transformers 4.57.3
  • PyTorch 2.9.0
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for debaterhub/debate-orpo-iter12

Finetuned
Qwen/Qwen3-30B-A3B
Adapter
(22)
this model