CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 13 days ago • 94
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck Paper • 2603.08462 • Published 29 days ago • 21
In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 30 days ago • 42
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 26 days ago • 64
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published 25 days ago • 147
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 179
view article Article Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance Dec 9, 2025 • 84