Qwen/DeepPlanning
Viewer • Updated • 2.14k • 785 • 192
None defined yet.
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation