HandVQA: Diagnosing and Improving Fine-Grained Spatial Reasoning about Hands in Vision-Language Models Paper • 2603.26362 • Published 6 days ago
LighthouseGS: Indoor Structure-aware 3D Gaussian Splatting for Panorama-Style Mobile Captures Paper • 2507.06109 • Published Jul 8, 2025
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions Paper • 2506.00421 • Published May 31, 2025 • 5
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects Paper • 2403.16428 • Published Mar 25, 2024
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting Paper • 2504.09097 • Published Apr 12, 2025
LLM-based User Profile Management for Recommender System Paper • 2502.14541 • Published Feb 20, 2025 • 6
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices Paper • 2502.04363 • Published Feb 5, 2025 • 12
Response Tuning: Aligning Large Language Models without Instruction Paper • 2410.02465 • Published Oct 3, 2024 • 13