Training
#2
by
Roman190928
- opened
I think you might not have given it enough tokens. In total, it would need 1.6B tokens, or over >6GB of raw text data. (20 tokens / parameter) or 6.8B tokens for (86 tokens / paramater)
I know. im working on training the following models on Fineweb-edu.
Thats a bit... many tokens...
I would recommend choosing a ratio (ie 86 tok / paramater) and calculating the amount of data needed for that ratio.
Roman190928
changed discussion status to
closed
I would take this in count for the following models