Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published Aug 4 • 132
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 166
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 429
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 96
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 186
DeepFlow: Serverless Large Language Model Serving at Scale Paper • 2501.14417 • Published Jan 24 • 3 • 2
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140