VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference Paper • 2512.01031 • Published 7 days ago • 22
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Paper • 2511.10645 • Published 24 days ago • 3
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 174
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 44
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 96
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 74
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 89