Text Generation
Transformers
PyTorch
Safetensors
English
i3
i3-architecture
hybrid-model
rwkv-mamba
custom_code

Training

#2
by Roman190928 - opened

I think you might not have given it enough tokens. In total, it would need 1.6B tokens, or over >6GB of raw text data. (20 tokens / parameter) or 6.8B tokens for (86 tokens / paramater)

i3-lab org

I know. im working on training the following models on Fineweb-edu.

Thats a bit... many tokens...
I would recommend choosing a ratio (ie 86 tok / paramater) and calculating the amount of data needed for that ratio.

Roman190928 changed discussion status to closed
i3-lab org

I would take this in count for the following models

Sign up or log in to comment