sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-AT-3 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-16-NEW-TRAIN-new-reward-AT-3 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-AT-2 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-16-NEW-TRAIN-new-reward-AT-2 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-AT-1 Text Generation • 8B • Updated Jun 30, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-16-NEW-TRAIN-new-reward-AT-1 Text Generation • 8B • Updated Jun 30, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL_42_reproduce Text Generation • 8B • Updated Jun 27, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-version-3-AT-1 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-version-AT-1 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-no-seed-AT-1 Text Generation • 8B • Updated Jun 26, 2025 • 1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-AT-1 Text Generation • 8B • Updated Jun 26, 2025 • 1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-9 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-8 Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-7 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-6 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-5 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-4 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-3 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-2 Text Generation • 8B • Updated Jun 26, 2025 • 1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-1 Text Generation • 8B • Updated Jun 26, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-7 8B • Updated Jun 25, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-6 8B • Updated Jun 25, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-5 8B • Updated Jun 25, 2025