Spaces:

OpenCaptchaWorld
/

captchav2_prototype

Sleeping

App Files Files Community

OpenCaptchaWorld commited on Nov 7, 2025

Commit

441dd2f

1 Parent(s): a042349

debug dependencies

Browse files

Files changed (4) hide show

.gitignore +27 -0
README.md +59 -6
requirements.txt +4 -0
results.csv +3 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,27 @@

+# Ignore runs directory (uploaded files)
+runs/
+*.json
+!results.csv
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+# Virtual environments
+venv/
+env/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db

README.md CHANGED Viewed

@@ -1,14 +1,67 @@
 ---
-title: Captchav2 Prototype
-emoji: ⚡
 colorFrom: indigo
-colorTo: pink
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-license: apache-2.0
-short_description: inner development phase use
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: CAPTCHAv2 Leaderboard
+emoji: 🏆
 colorFrom: indigo
+colorTo: purple
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
+license: mit
 ---
+# CAPTCHAv2 Leaderboard
+A comprehensive leaderboard for comparing model performance across different CAPTCHA puzzle types. This interactive dashboard allows you to:
+- 📊 View performance rankings across different CAPTCHA categories
+- 📈 Compare models using interactive visualizations
+- 💰 Analyze cost-effectiveness of different models
+- 🔄 Upload and update results easily
+## Features
+- **Interactive Leaderboard Table**: Sortable rankings with color-coded performance indicators
+- **Performance Comparison Charts**: Visual bar charts showing pass rates across models
+- **Performance by Type**: Detailed breakdown of performance across different CAPTCHA puzzle types
+- **Cost-Effectiveness Analysis**: Scatter plot comparing performance vs. cost
+- **Easy Upload**: Support for CSV and JSON result files
+## How to Use
+1. **View the Leaderboard**: Browse the current rankings and filter by category
+2. **Sort Results**: Sort by Pass Rate, Duration, or Cost
+3. **Upload Results**: Use the upload section to add new evaluation results
+4. **Compare Models**: Use the visualizations to compare different models
+## Uploading Results
+The leaderboard supports multiple file formats:
+- **CSV files**: Aggregated results with columns for Model, Provider, Agent Framework, Type, metrics, and per-type pass rates
+- **JSON files**: Single object or array of aggregated results
+- **benchmark_results.json**: Per-puzzle results in JSONL format (auto-converted)
+See the upload section in the app for detailed instructions and file format requirements.
+## Categories
+The leaderboard tracks performance across various CAPTCHA types including:
+- Dice Count
+- Color Cipher
+- Color Counting
+- Dynamic Jigsaw
+- Mirror
+- Set Game
+- Shadow Plausible
+- Spooky variants (Circle, Jigsaw, Shape Grid, Size, Text)
+- Trajectory Recovery
+- Transform Pipeline
+- And more...
+## Model Types
+Models are automatically categorized as:
+- **Proprietary**: Commercial models (OpenAI, Anthropic, Google, etc.)
+- **Open source**: Open source models (Llama, Mistral, Qwen, etc.)

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio>=5.49.1
+pandas>=2.3.3
+matplotlib>=3.10.7
+numpy

results.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+Model,Provider,Agent Framework,Type,Overall Pass Rate,Avg Duration (s),Avg Cost ($),Color_Cipher,Color_Counting,Dice_Count,Dynamic_Jigsaw,Static_Puzzle,Map_Parity,Mirror,Red_Dot,Set_Game,Shadow_Plausible,Spooky_Circle,Spooky_Circle_Grid,Spooky_Jigsaw,Spooky_Shape_Grid,Spooky_Size,Spooky_Text,Squiggle,Trajectory_Recovery,Transform_Pipeline
+gpt-5-2025-08-07,OpenAI,browser-use,Proprietary,0.16363636363636364, 10, 24.1,0.15,0.0,0.0,0.2,,,0.2,0.3,,0.0,0.2,0.4,,,0.0,0.0,0.35,,
+Browser-Use BU-1.0,browser-use,browser-use,Proprietary,0.03333333333333333, 2,8.3,0.1,0.0,0.0,,0.0,,0.1,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0