Training Summary
============================================================
Total steps: 56
Early stopping: Yes
Stopped after 10 evaluations without improvement
Best validation reward: 0.5795
Best checkpoint step: 36
Final validation reward: -0.0896
