Xetrieval: Mechanistically Explaining Dense Retrieval Paper • 2605.29507 • Published 18 days ago • 21
Xetrieval: Mechanistically Explaining Dense Retrieval Paper • 2605.29507 • Published 18 days ago • 21
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper • 2604.16593 • Published Apr 17 • 6 • 3
Running Agents 24 Croissant Checker - Dev 🔎 24 Validate Croissant dataset files for NeurIPS submissions
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper • 2604.16593 • Published Apr 17 • 6
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper • 2604.16593 • Published Apr 17 • 6
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper • 2604.16593 • Published Apr 17 • 6
\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper • 2603.07980 • Published Mar 9 • 27 • 4