ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models Paper • 2505.21500 • Published May 27, 2025 • 13
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published Aug 7, 2025 • 20
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models Paper • 2510.08531 • Published Oct 9, 2025 • 12
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published 3 days ago • 35
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 10 days ago • 92
PEARL: Personalized Streaming Video Understanding Model Paper • 2603.20422 • Published 22 days ago • 40
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions Paper • 2603.03447 • Published Mar 3 • 37
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics Paper • 2603.13391 • Published Mar 11 • 19
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2603.08652 • Published Mar 9 • 40
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published Feb 9 • 39
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing Paper • 2602.01851 • Published Feb 2 • 16
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering Paper • 2509.25175 • Published Sep 29, 2025 • 31
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts Paper • 2509.25160 • Published Sep 29, 2025 • 32