Training Summary
============================================================
Total steps: 44
Early stopping: Yes
Stopped after 10 evaluations without improvement
Best validation reward: 0.0393
Best checkpoint step: 24
Final validation reward: -0.0431
