DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published 7 days ago • 146
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training Paper • 2510.08008 • Published Oct 9 • 5
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression Paper • 2506.02678 • Published Jun 3 • 5
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48