Qwen3-0.6B-alphabet-sort-grpo

This model was trained using GRPO with the ๐Ÿ”€ alphabet-sort RL environment.

Compared to the original model, it shows improved performance on this alphabetical sorting task.

โžก๏ธ For training walkthrough, evaluation and other details, refer to this article.

Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for anakin87/Qwen3-0.6B-alphabet-sort-grpo

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(421)
this model