-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2501.14249
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 77 -
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Paper • 2206.04615 • Published • 5 -
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Paper • 2210.09261 • Published • 1 -
BIG-Bench Extra Hard
Paper • 2502.19187 • Published • 10
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 54 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 32 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 287 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 77 -
Benchmarking LLMs for Political Science: A United Nations Perspective
Paper • 2502.14122 • Published • 2 -
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
Paper • 2503.04644 • Published • 21 -
ExpertGenQA: Open-ended QA generation in Specialized Domains
Paper • 2503.02948 • Published
-
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 51 -
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
Paper • 2412.13018 • Published • 41 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 85 -
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 77 -
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Paper • 2206.04615 • Published • 5 -
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Paper • 2210.09261 • Published • 1 -
BIG-Bench Extra Hard
Paper • 2502.19187 • Published • 10
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 54 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 32 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 77 -
Benchmarking LLMs for Political Science: A United Nations Perspective
Paper • 2502.14122 • Published • 2 -
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
Paper • 2503.04644 • Published • 21 -
ExpertGenQA: Open-ended QA generation in Specialized Domains
Paper • 2503.02948 • Published
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 287 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147
-
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 51 -
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
Paper • 2412.13018 • Published • 41 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 85 -
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44