OpenCaptchaWorld commited on
Commit
441dd2f
Β·
1 Parent(s): a042349

debug dependencies

Browse files
Files changed (4) hide show
  1. .gitignore +27 -0
  2. README.md +59 -6
  3. requirements.txt +4 -0
  4. results.csv +3 -0
.gitignore ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ignore runs directory (uploaded files)
2
+ runs/
3
+ *.json
4
+ !results.csv
5
+
6
+ # Python
7
+ __pycache__/
8
+ *.py[cod]
9
+ *$py.class
10
+ *.so
11
+ .Python
12
+
13
+ # Virtual environments
14
+ venv/
15
+ env/
16
+ ENV/
17
+
18
+ # IDE
19
+ .vscode/
20
+ .idea/
21
+ *.swp
22
+ *.swo
23
+
24
+ # OS
25
+ .DS_Store
26
+ Thumbs.db
27
+
README.md CHANGED
@@ -1,14 +1,67 @@
1
  ---
2
- title: Captchav2 Prototype
3
- emoji: ⚑
4
  colorFrom: indigo
5
- colorTo: pink
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
- license: apache-2.0
11
- short_description: inner development phase use
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: CAPTCHAv2 Leaderboard
3
+ emoji: πŸ†
4
  colorFrom: indigo
5
+ colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
 
11
  ---
12
 
13
+ # CAPTCHAv2 Leaderboard
14
+
15
+ A comprehensive leaderboard for comparing model performance across different CAPTCHA puzzle types. This interactive dashboard allows you to:
16
+
17
+ - πŸ“Š View performance rankings across different CAPTCHA categories
18
+ - πŸ“ˆ Compare models using interactive visualizations
19
+ - πŸ’° Analyze cost-effectiveness of different models
20
+ - πŸ”„ Upload and update results easily
21
+
22
+ ## Features
23
+
24
+ - **Interactive Leaderboard Table**: Sortable rankings with color-coded performance indicators
25
+ - **Performance Comparison Charts**: Visual bar charts showing pass rates across models
26
+ - **Performance by Type**: Detailed breakdown of performance across different CAPTCHA puzzle types
27
+ - **Cost-Effectiveness Analysis**: Scatter plot comparing performance vs. cost
28
+ - **Easy Upload**: Support for CSV and JSON result files
29
+
30
+ ## How to Use
31
+
32
+ 1. **View the Leaderboard**: Browse the current rankings and filter by category
33
+ 2. **Sort Results**: Sort by Pass Rate, Duration, or Cost
34
+ 3. **Upload Results**: Use the upload section to add new evaluation results
35
+ 4. **Compare Models**: Use the visualizations to compare different models
36
+
37
+ ## Uploading Results
38
+
39
+ The leaderboard supports multiple file formats:
40
+
41
+ - **CSV files**: Aggregated results with columns for Model, Provider, Agent Framework, Type, metrics, and per-type pass rates
42
+ - **JSON files**: Single object or array of aggregated results
43
+ - **benchmark_results.json**: Per-puzzle results in JSONL format (auto-converted)
44
+
45
+ See the upload section in the app for detailed instructions and file format requirements.
46
+
47
+ ## Categories
48
+
49
+ The leaderboard tracks performance across various CAPTCHA types including:
50
+ - Dice Count
51
+ - Color Cipher
52
+ - Color Counting
53
+ - Dynamic Jigsaw
54
+ - Mirror
55
+ - Set Game
56
+ - Shadow Plausible
57
+ - Spooky variants (Circle, Jigsaw, Shape Grid, Size, Text)
58
+ - Trajectory Recovery
59
+ - Transform Pipeline
60
+ - And more...
61
+
62
+ ## Model Types
63
+
64
+ Models are automatically categorized as:
65
+ - **Proprietary**: Commercial models (OpenAI, Anthropic, Google, etc.)
66
+ - **Open source**: Open source models (Llama, Mistral, Qwen, etc.)
67
+
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio>=5.49.1
2
+ pandas>=2.3.3
3
+ matplotlib>=3.10.7
4
+ numpy
results.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ Model,Provider,Agent Framework,Type,Overall Pass Rate,Avg Duration (s),Avg Cost ($),Color_Cipher,Color_Counting,Dice_Count,Dynamic_Jigsaw,Static_Puzzle,Map_Parity,Mirror,Red_Dot,Set_Game,Shadow_Plausible,Spooky_Circle,Spooky_Circle_Grid,Spooky_Jigsaw,Spooky_Shape_Grid,Spooky_Size,Spooky_Text,Squiggle,Trajectory_Recovery,Transform_Pipeline
2
+ gpt-5-2025-08-07,OpenAI,browser-use,Proprietary,0.16363636363636364, 10, 24.1,0.15,0.0,0.0,0.2,,,0.2,0.3,,0.0,0.2,0.4,,,0.0,0.0,0.35,,
3
+ Browser-Use BU-1.0,browser-use,browser-use,Proprietary,0.03333333333333333, 2,8.3,0.1,0.0,0.0,,0.0,,0.1,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0