Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
kongqi 's Collections
Llm
timeseries

Llm

updated Sep 16
Upvote
-

  • AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

    Paper • 2509.08755 • Published Sep 10 • 56

  • The Majority is not always right: RL training for solution aggregation

    Paper • 2509.06870 • Published Sep 8 • 16

  • Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

    Paper • 2509.07980 • Published Sep 9 • 101

  • Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

    Paper • 2509.03646 • Published Sep 3 • 30
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs