bbunzeck
/

lexdec-large-bpe

Model card Files Files and versions

lexdec-large-bpe / README.md

bbunzeck's picture

Create README.md

424f25f verified 9 months ago

|

history blame contribute delete

1.81 kB

	---
	language:
	- en
	---

	lexdec-large-bpe is a small, autoregressive llama model featuring character-level tokenization, trained on the 2024/2025 [BabyLM dataset](https://osf.io/ryjfm/). The checkpoints branch contains 19 checkpoints, 10 across the first 10% of pretraining and 9 more for the remaining 9 percent of pretraining.

	We used this model to trace the development of linguistic knowledge (word-level, syntax) across pretraining and to compare it to both larger character-level models and comparable subword models:

	\| \| [small-char](https://huggingface.co/bbunzeck/lexdec-small-char) \| [medium-char](https://huggingface.co/bbunzeck/lexdec-medium-char) \| [large-char](https://huggingface.co/bbunzeck/lexdec-large-char) \| [small-bpe](https://huggingface.co/bbunzeck/lexdec-small-bpe) \| [medium-bpe](https://huggingface.co/bbunzeck/lexdec-medium-bpe) \| [large-bpe](https://huggingface.co/bbunzeck/lexdec-large-bpe) \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| Embedding size \| 128 \| 256 \| 512 \| 128 \| 256 \| 512 \|
	\| Hidden size \| 128 \| 256 \| 512 \| 128 \| 256 \| 512 \|
	\| Layers \| 4 \| 8 \| 12 \| 4 \| 8 \| 12 \|
	\| Attention heads \| 4 \| 8 \| 12 \| 4 \| 8 \| 12 \|
	\| Context size \| 128 \| 128 \| 128 \| 128 \| 128 \| 128 \|
	\| Vocab. size \| 102 \| 102 \| 102 \| 8,002 \| 8,002 \| 8,002 \|
	\| Parameters \| 486,016 \| 3,726,592 \| 21,940,736 \| 2,508,416 \| 7,771,392 \| 30,030,336 \|

	If you use this model, please cite the following preprint (the final version will be added as soon as it is published):

	```
	@misc{bunzeck2025subwordmodelsstruggleword,
	title={Subword models struggle with word learning, but surprisal hides it},
	author={Bastian Bunzeck and Sina Zarrieß},
	year={2025},
	eprint={2502.12835},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.12835},}
	```