Seeing Culture: A Benchmark for Visual Reasoning and Grounding Paper • 2509.16517 • Published Sep 20, 2025 • 3
Can Large Language Models Understand, Reason About, and Generate Code-Switched Text? Paper • 2601.07153 • Published Jan 12
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG Paper • 2512.05959 • Published Dec 5, 2025
LinguDistill: Recovering Linguistic Ability in Vision- Language Models via Selective Cross-Modal Distillation Paper • 2604.00829 • Published 4 days ago • 5
LinguDistill: Recovering Linguistic Ability in Vision- Language Models via Selective Cross-Modal Distillation Paper • 2604.00829 • Published 4 days ago • 5
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22, 2025 • 21
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs Paper • 2510.13586 • Published Oct 15, 2025 • 1
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark Paper • 2402.05138 • Published Feb 6, 2024 • 2
MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training Paper • 2510.12831 • Published Oct 12, 2025 • 5
Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting Paper • 2509.00482 • Published Aug 30, 2025
Thai Semantic End-of-Turn Detection for Real-Time Voice Agents Paper • 2510.04016 • Published Oct 5, 2025 • 4