OpenBioRQ: Unsolved Biomedical Research Questions for Agents Paper • 2606.21959 • Published 11 days ago • 4
Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents Paper • 2606.27595 • Published 6 days ago • 6
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks Paper • 2605.28556 • Published May 27 • 73
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering Paper • 2606.00683 • Published May 30 • 98
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published May 28 • 115
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence Paper • 2605.26494 • Published May 26 • 41
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published May 28 • 79
The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models Paper • 2511.20344 • Published Nov 25, 2025 • 14
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Paper • 2509.25758 • Published Sep 30, 2025 • 25
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 82
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published Apr 30 • 13
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack Paper • 2509.25843 • Published Apr 14 • 20
System Message Generation for User Preferences using Open-Source Models Paper • 2502.11330 • Published Feb 17, 2025 • 15
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published Feb 20, 2025 • 26