Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation Paper • 2605.26111 • Published 10 days ago • 11
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation Paper • 2605.26111 • Published 10 days ago • 11
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation Paper • 2605.26111 • Published 10 days ago • 11
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Paper • 2605.23892 • Published 13 days ago • 8
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Paper • 2605.23902 • Published 13 days ago • 45
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Paper • 2605.23892 • Published 13 days ago • 8
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Paper • 2605.23892 • Published 13 days ago • 8
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published Mar 17 • 87
Test-Time Training with KV Binding Is Secretly Linear Attention Paper • 2602.21204 • Published Feb 24 • 32
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Paper • 2409.03757 • Published Sep 5, 2024 • 3
Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling Paper • 2510.23605 • Published Oct 27, 2025 • 6
Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling Paper • 2510.23605 • Published Oct 27, 2025 • 6
Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling Paper • 2510.23605 • Published Oct 27, 2025 • 6 • 1