Number of Pretraining Tokens per Qwen 2.5 Model?
#9
by
RylanSchaeffer
- opened
Hi! I'm trying to find out how many tokens each of the Qwen 2.5 models were pretrained on to estimate the training FLOP (6ND is fine for my purposes) for a plot in an ML research paper.
This website gives architectural details: https://qwenlm.github.io/blog/qwen2.5-llm/
I unfortunately can't find the pretraining tokens. Can someone please clarify?
Cross posted: https://github.com/QwenLM/Qwen3/discussions/1613
trying to find the same thing, only says 36T tokens in Qwen3 but no specific per model D.
