george2cool36's picture
Update README.md
3816440 verified
---
license: mit
library_name: autogluon
pipeline_tag: tabular-classification
datasets:
- scottymcgee/flowers
model-index:
- name: hw2_classical_automl
results:
- task:
type: tabular-classification
name: Tabular Classification
dataset:
name: scottymcgee/flowers
type: scottymcgee/flowers
split: test
metrics:
- name: accuracy
type: accuracy
value: 0.87 # <-- replace with your number
- name: f1_macro
type: f1
value: 0.84 # <-- replace with your number
---
# HW2 Classical AutoML — AutoGluon TabularPredictor
## Model Overview
This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679.
It predicts the **target column** (`color`) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height).
The workflow demonstrates how **classical AutoML** can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning.
## Dataset
- **Source:** Scotty’s HW1 tabular dataset on Hugging Face (`scottymcgee/flowers`)
- **Samples:** ~30 original samples, expanded via augmentation
- **Features:** numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
- **Target:** `color` (multiclass, 6 possible values)
- **Split:** 80% training, 20% validation
## Training Configuration
- **Framework:** AutoGluon `TabularPredictor`
- **Presets:** `medium_quality` (balanced speed vs. accuracy)
- **Problem Type:** `multiclass` classification
- **Time Limit:** 600 seconds (10 minutes)
- **Random Seed:** 42 (for reproducible train/val split)
- **Hardware:** Google Colab CPU/GPU runtime
AutoGluon automatically handled:
- Standardization of numeric features
- Encoding of categorical features (none in this dataset)
- Model ensembling and stacking
## Results
- **Best model:** *Reported by AutoGluon leaderboard*
- **Validation Metric (Weighted F1):** ~0.9 (exact value depends on random seed / run)
- **Leaderboard:** includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM
*Note:* Due to the small dataset size, metrics may vary slightly across runs.
## Repository Artifacts
- `autogluon_predictor.pkl` → cloudpickled predictor (loadable if library versions match)
- `autogluon_predictor_dir.zip` → zipped native AutoGluon directory (preferred for portability)
## AI Tool Disclosure
This notebook used ChatGPT for scaffolding code and documentation.
All dataset selection, training, evaluation, and uploads were performed by the student.