Training Summary
============================================================
Total steps: 60
Early stopping: Yes
Stopped after 10 evaluations without improvement
Best validation reward: 3.2957
Best checkpoint step: 40
Final validation reward: 2.8640
