daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step20 9B • Updated 8 days ago • 19
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step36 9B • Updated 8 days ago • 17
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step4 9B • Updated 8 days ago • 19
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step10 9B • Updated 8 days ago • 16
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-80_actor Text Generation • 8B • Updated Jun 26 • 2
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-20_actor Text Generation • 8B • Updated Jun 26 • 4
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-80_critic Text Generation • 8B • Updated Jun 25 • 3
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-20_critic Text Generation • 8B • Updated Jun 25 • 4
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-60_critic Text Generation • 8B • Updated Jun 25 • 4
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-120_critic Text Generation • 8B • Updated Jun 25 • 7
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step120_crtic Text Generation • 8B • Updated Jun 25 • 5
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-120_actor Text Generation • 8B • Updated Jun 25 • 5
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-60_actor Text Generation • 8B • Updated Jun 25 • 4
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step120_actor Text Generation • 8B • Updated Jun 25 • 4
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step80 Text Generation • 8B • Updated Jun 25 • 6
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step60 Text Generation • 8B • Updated Jun 25 • 5
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step120 Text Generation • 8B • Updated Jun 25 • 6
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step20 Text Generation • 8B • Updated Jun 25 • 6