Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement Paper • 2508.00410 • Published Aug 1, 2025 • 1
Co-rewarding Collection Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views. • 75 items • Updated 11 days ago • 1