VESTA: Visual Exploration with Statistical Tool Agents Paper • 2606.00384 • Published 22 days ago • 2
Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference Paper • 2606.05308 • Published 17 days ago • 2
When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges Paper • 2605.26046 • Published 26 days ago • 3
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published Jan 17, 2025 • 10