Training Summary
============================================================
Total steps: 92
Early stopping: Yes
Stopped after 10 evaluations without improvement
Best validation reward: 1.9157
Best checkpoint step: 72
Final validation reward: 1.3385
