MMMU

non-profit

https://mmmu-benchmark.github.io/

Activity Feed Request to join this org

AI & ML interests

Multimodal Model Evaluation

Recent Activity

aaabiao authored a paper 11 days ago

Aligning Instruction Tuning with Pre-training

aaabiao authored a paper 11 days ago

YuE: Scaling Open Foundation Models for Long-Form Music Generation

aaabiao authored a paper 11 days ago

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

View all activity

authored 8 papers 11 days ago

Aligning Instruction Tuning with Pre-training

Paper • 2501.09368 • Published Jan 16, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11, 2025 • 73

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

Paper • 2509.24709 • Published Sep 29, 2025 • 7

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Paper • 2510.14616 • Published Oct 16, 2025 • 13

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

Paper • 2510.14763 • Published Oct 16, 2025 • 14

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 67

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Paper • 2602.22675 • Published Feb 26 • 23

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Paper • 2606.18023 • Published 13 days ago • 207

in MMMU/MMMU_Pro 20 days ago

Update eval.yaml

#7 opened 23 days ago by

authored 3 papers 23 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

Cosmos 3: Omnimodal World Models for Physical AI

Paper • 2606.02800 • Published 28 days ago • 136

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Paper • 2606.05080 • Published 26 days ago • 30

updated a dataset 29 days ago

MMMU/MMMU_Pro

Benchmark • Updated 20 days ago • 5.19k • 18.2k • 60

authored a paper about 1 month ago

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Paper • 2605.24218 • Published May 22 • 46

authored 4 papers about 2 months ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

Paper • 2604.05117 • Published Apr 6 • 36

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 265

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2604.12374 • Published Apr 14 • 37

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published May 3 • 126

authored a paper about 2 months ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published Apr 27 • 71

updated a dataset 2 months ago

MMMU/MMMU

Viewer • Updated Apr 21 • 11.6k • 59.4k • 329