This collection contains datasets and models related to "BLEUBERI: BLEU is a surprisingly effective reward for instruction following".
Yapei Chang PRO
yapeichang
AI & ML interests
NLP
Organizations
models
12
yapeichang/sft_qwen25-7b_v1
Updated
yapeichang/sft_olmo7b-base_v1
Updated
yapeichang/Llama-3.1-8B-SFT
Text Generation
•
Updated
•
33
yapeichang/Qwen2.5-7B-BLEUBERI
Text Generation
•
Updated
•
22
•
1
yapeichang/Qwen2.5-3B-RM8B
Text Generation
•
Updated
•
17
yapeichang/Qwen2.5-7B-SFT
Text Generation
•
Updated
•
9
yapeichang/Qwen2.5-3B-SFT
Text Generation
•
Updated
•
72
yapeichang/Qwen2.5-3B-BLEUBERI
Text Generation
•
Updated
•
11
yapeichang/Llama-3.1-8B-RM8B
Text Generation
•
Updated
•
3
yapeichang/Qwen2.5-7B-RM8B
Text Generation
•
Updated
•
18