File size: 2,220 Bytes
f312abf 441dd2f f312abf 441dd2f f312abf 441dd2f f312abf 441dd2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
title: CAPTCHAv2 Leaderboard
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
---
# CAPTCHAv2 Leaderboard
A comprehensive leaderboard for comparing model performance across different CAPTCHA puzzle types. This interactive dashboard allows you to:
- π View performance rankings across different CAPTCHA categories
- π Compare models using interactive visualizations
- π° Analyze cost-effectiveness of different models
- π Upload and update results easily
## Features
- **Interactive Leaderboard Table**: Sortable rankings with color-coded performance indicators
- **Performance Comparison Charts**: Visual bar charts showing pass rates across models
- **Performance by Type**: Detailed breakdown of performance across different CAPTCHA puzzle types
- **Cost-Effectiveness Analysis**: Scatter plot comparing performance vs. cost
- **Easy Upload**: Support for CSV and JSON result files
## How to Use
1. **View the Leaderboard**: Browse the current rankings and filter by category
2. **Sort Results**: Sort by Pass Rate, Duration, or Cost
3. **Upload Results**: Use the upload section to add new evaluation results
4. **Compare Models**: Use the visualizations to compare different models
## Uploading Results
The leaderboard supports multiple file formats:
- **CSV files**: Aggregated results with columns for Model, Provider, Agent Framework, Type, metrics, and per-type pass rates
- **JSON files**: Single object or array of aggregated results
- **benchmark_results.json**: Per-puzzle results in JSONL format (auto-converted)
See the upload section in the app for detailed instructions and file format requirements.
## Categories
The leaderboard tracks performance across various CAPTCHA types including:
- Dice Count
- Color Cipher
- Color Counting
- Dynamic Jigsaw
- Mirror
- Set Game
- Shadow Plausible
- Spooky variants (Circle, Jigsaw, Shape Grid, Size, Text)
- Trajectory Recovery
- Transform Pipeline
- And more...
## Model Types
Models are automatically categorized as:
- **Proprietary**: Commercial models (OpenAI, Anthropic, Google, etc.)
- **Open source**: Open source models (Llama, Mistral, Qwen, etc.)
|