EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published May 22 • 81
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published May 19 • 190
BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE Paper • 2605.14438 • Published May 14 • 5
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety Paper • 2605.05704 • Published May 7 • 3
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published May 7 • 237
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction Paper • 2605.06642 • Published May 7 • 28
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows Paper • 2604.28139 • Published Apr 30 • 42
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published Apr 30 • 59
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 171