Plan-X: Instruct Video Generation via Semantic Planning Paper • 2511.17986 • Published Nov 22 • 16
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published Jul 31 • 23
naver/DUSt3R_ViTLarge_BaseDecoder_512_dpt Image-to-3D • 0.6B • Updated Jul 12, 2024 • 29.1k • 16
Long-Video Audio Synthesis with Multi-Agent Collaboration Paper • 2503.10719 • Published Mar 13 • 9
Junyi42/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt Image-to-3D • 0.6B • Updated Oct 30, 2024 • 657k • 20