Training

by Roman190928 - opened 12 days ago

Roman190928

12 days ago

I think you might not have given it enough tokens. In total, it would need 1.6B tokens, or over >6GB of raw text data. (20 tokens / parameter) or 6.8B tokens for (86 tokens / paramater)

FlameF0X

i3-lab org 12 days ago

I know. im working on training the following models on Fineweb-edu.

Roman190928

8 days ago

Thats a bit... many tokens...
I would recommend choosing a ratio (ie 86 tok / paramater) and calculating the amount of data needed for that ratio.

Roman190928 changed discussion status to closed 8 days ago

FlameF0X

i3-lab org 8 days ago

I would take this in count for the following models

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment