Training Summary
============================================================
Total steps: 58
Early stopping: Yes
Stopped after 10 evaluations without improvement
Best validation reward: 1.8787
Best checkpoint step: 38
Final validation reward: 1.2003
