Canvas-to-Image: Compositional Image Generation with Multimodal Controls Paper β’ 2511.21691 β’ Published Nov 26, 2025 β’ 35
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning Paper β’ 2510.25772 β’ Published Oct 29, 2025 β’ 32
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper β’ 2510.08673 β’ Published Oct 9, 2025 β’ 125
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Paper β’ 2509.20360 β’ Published Sep 24, 2025 β’ 17
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention Paper β’ 2507.17745 β’ Published Jul 23, 2025 β’ 35
Pixels, Patterns, but No Poetry: To See The World like Humans Paper β’ 2507.16863 β’ Published Jul 21, 2025 β’ 68
TaskCraft: Automated Generation of Agentic Tasks Paper β’ 2506.10055 β’ Published Jun 11, 2025 β’ 32
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper β’ 2505.24417 β’ Published May 30, 2025 β’ 13
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper β’ 2505.19297 β’ Published May 25, 2025 β’ 84
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action Paper β’ 2505.01583 β’ Published May 2, 2025 β’ 8
YoChameleon: Personalized Vision and Language Generation Paper β’ 2504.20998 β’ Published Apr 29, 2025 β’ 12
DreamO: A Unified Framework for Image Customization Paper β’ 2504.16915 β’ Published Apr 23, 2025 β’ 24
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper β’ 2504.07615 β’ Published Apr 10, 2025 β’ 35
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper β’ 2504.08685 β’ Published Apr 11, 2025 β’ 130
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper β’ 2504.02160 β’ Published Apr 2, 2025 β’ 37
One-Minute Video Generation with Test-Time Training Paper β’ 2504.05298 β’ Published Apr 7, 2025 β’ 110