Qwen3-0.6B-alphabet-sort-grpo

This model was trained using GRPO with the 🔀 alphabet-sort RL environment.

Compared to the original model, it shows improved performance on this alphabetical sorting task.

➡️ For training walkthrough, evaluation and other details, refer to this article.

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for anakin87/Qwen3-0.6B-alphabet-sort-grpo

Base model

Finetuned

Finetuned

(421)

this model