Training Summary
============================================================
Total steps: 58
Early stopping: Yes
Stopped after 10 evaluations without improvement
Best validation reward: 3.2676
Best checkpoint step: 38
Final validation reward: 2.9473
