The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 544
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Paper • 1909.08053 • Published Sep 17, 2019 • 5