-
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Paper • 2512.05081 • Published • 31 -
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Paper • 2512.04678 • Published • 42 -
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Paper • 2512.00425 • Published • 51 -
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Paper • 2511.20549 • Published • 26
Collections
Discover the best community collections!
Collections including paper arxiv:2512.14699
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 170 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Paper • 2501.05767 • Published • 29 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 41 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
Paper • 2512.14699 • Published • 28
-
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Paper • 2512.05081 • Published • 31 -
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Paper • 2512.04678 • Published • 42 -
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Paper • 2512.00425 • Published • 51 -
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Paper • 2511.20549 • Published • 26
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 170 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Paper • 2501.05767 • Published • 29 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 41 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
Paper • 2512.14699 • Published • 28
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23