File size: 2,220 Bytes
f312abf
441dd2f
 
f312abf
441dd2f
f312abf
 
 
 
441dd2f
f312abf
 
441dd2f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
title: CAPTCHAv2 Leaderboard
emoji: πŸ†
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
---

# CAPTCHAv2 Leaderboard

A comprehensive leaderboard for comparing model performance across different CAPTCHA puzzle types. This interactive dashboard allows you to:

- πŸ“Š View performance rankings across different CAPTCHA categories
- πŸ“ˆ Compare models using interactive visualizations
- πŸ’° Analyze cost-effectiveness of different models
- πŸ”„ Upload and update results easily

## Features

- **Interactive Leaderboard Table**: Sortable rankings with color-coded performance indicators
- **Performance Comparison Charts**: Visual bar charts showing pass rates across models
- **Performance by Type**: Detailed breakdown of performance across different CAPTCHA puzzle types
- **Cost-Effectiveness Analysis**: Scatter plot comparing performance vs. cost
- **Easy Upload**: Support for CSV and JSON result files

## How to Use

1. **View the Leaderboard**: Browse the current rankings and filter by category
2. **Sort Results**: Sort by Pass Rate, Duration, or Cost
3. **Upload Results**: Use the upload section to add new evaluation results
4. **Compare Models**: Use the visualizations to compare different models

## Uploading Results

The leaderboard supports multiple file formats:

- **CSV files**: Aggregated results with columns for Model, Provider, Agent Framework, Type, metrics, and per-type pass rates
- **JSON files**: Single object or array of aggregated results
- **benchmark_results.json**: Per-puzzle results in JSONL format (auto-converted)

See the upload section in the app for detailed instructions and file format requirements.

## Categories

The leaderboard tracks performance across various CAPTCHA types including:
- Dice Count
- Color Cipher
- Color Counting
- Dynamic Jigsaw
- Mirror
- Set Game
- Shadow Plausible
- Spooky variants (Circle, Jigsaw, Shape Grid, Size, Text)
- Trajectory Recovery
- Transform Pipeline
- And more...

## Model Types

Models are automatically categorized as:
- **Proprietary**: Commercial models (OpenAI, Anthropic, Google, etc.)
- **Open source**: Open source models (Llama, Mistral, Qwen, etc.)