Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
Paper
โข
2506.08125
โข
Published
Bingo is a reinforcement learning (RL) framework designed to improve the efficiency of reasoning in large language models.
It introduces two key mechanisms:
This approach achieves a favorable balance between accuracy and efficiency, outperforming vanilla rewards and prior length-based reward baselines.
The released checkpoints are trained from DeepSeek-R1-Distill-Qwen-1.5 and target reasoning-intensive tasks:
Checkpoints correspond to the folders r1_1.5b_Bingo_A and r1_1.5b_Bingo_E.
If you use these models, please cite:
@article{liu2025bingo,
title = {Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning},
author = {Liu, Hanbing and Cao, Lang and Ren, Yuanyi and Zhou, Mengyu and Dong, Haoyu and Ma, Xiaojun and Han, Shi and Zhang, Dongmei},
journal = {arXiv preprint arXiv:2506.08125},
year = {2025}
}