view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • May 14 • 61
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 169
Running on CPU Upgrade Featured 3.21k The Smol Training Playbook 📚 3.21k The secrets to building world-class LLMs
Running 114 Unlocking On-Policy Distillation for Any Model Family 📝 114 Explore on-policy distillation visualization for any model
view article Article Efficient MultiModal Data Pipeline +3 ariG23498, lusxvr, andito, sergiopaniego, pcuenq • Jul 8, 2025 • 72