The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models Paper • 2606.03645 • Published 14 days ago • 5
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation Paper • 2606.02553 • Published 11 days ago • 19
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws Paper • 2605.23901 • Published 21 days ago • 13
BitCPM-CANN Collection Full-pipeline ternary quantized model trained on CANN. • 12 items • Updated 18 days ago • 27
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs Paper • 2605.20315 • Published 24 days ago • 28
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos Paper • 2605.18233 • Published 25 days ago • 92
Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published 25 days ago • 30
StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing Paper • 2605.02904 • Published Apr 5 • 8
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published Apr 11 • 82
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 312
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach Paper • 2603.13364 • Published Mar 9 • 9
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training Paper • 2603.10444 • Published Mar 11 • 12
Mixture of Attention Heads: Selecting Attention Heads Per Token Paper • 2210.05144 • Published Oct 11, 2022 • 3
MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling Paper • 2602.03359 • Published Feb 3 • 10
MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers Paper • 2602.00398 • Published Jan 30 • 6