A continuously updated benchmark evaluating AI coding agents on real-world software engineering tasks from GitHub issues.