|
|
|
|
|
--- |
|
|
license: mit |
|
|
library_name: autogluon |
|
|
pipeline_tag: tabular-classification |
|
|
datasets: |
|
|
- scottymcgee/flowers |
|
|
model-index: |
|
|
- name: hw2_classical_automl |
|
|
results: |
|
|
- task: |
|
|
type: tabular-classification |
|
|
name: Tabular Classification |
|
|
dataset: |
|
|
name: scottymcgee/flowers |
|
|
type: scottymcgee/flowers |
|
|
split: test |
|
|
metrics: |
|
|
- name: accuracy |
|
|
type: accuracy |
|
|
value: 0.87 |
|
|
- name: f1_macro |
|
|
type: f1 |
|
|
value: 0.84 |
|
|
--- |
|
|
|
|
|
# HW2 Classical AutoML — AutoGluon TabularPredictor |
|
|
|
|
|
## Model Overview |
|
|
This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679. |
|
|
It predicts the **target column** (`color`) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height). |
|
|
|
|
|
The workflow demonstrates how **classical AutoML** can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning. |
|
|
|
|
|
## Dataset |
|
|
- **Source:** Scotty’s HW1 tabular dataset on Hugging Face (`scottymcgee/flowers`) |
|
|
- **Samples:** ~30 original samples, expanded via augmentation |
|
|
- **Features:** numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm) |
|
|
- **Target:** `color` (multiclass, 6 possible values) |
|
|
- **Split:** 80% training, 20% validation |
|
|
|
|
|
## Training Configuration |
|
|
- **Framework:** AutoGluon `TabularPredictor` |
|
|
- **Presets:** `medium_quality` (balanced speed vs. accuracy) |
|
|
- **Problem Type:** `multiclass` classification |
|
|
- **Time Limit:** 600 seconds (10 minutes) |
|
|
- **Random Seed:** 42 (for reproducible train/val split) |
|
|
- **Hardware:** Google Colab CPU/GPU runtime |
|
|
|
|
|
AutoGluon automatically handled: |
|
|
- Standardization of numeric features |
|
|
- Encoding of categorical features (none in this dataset) |
|
|
- Model ensembling and stacking |
|
|
|
|
|
## Results |
|
|
- **Best model:** *Reported by AutoGluon leaderboard* |
|
|
- **Validation Metric (Weighted F1):** ~0.9 (exact value depends on random seed / run) |
|
|
- **Leaderboard:** includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM |
|
|
|
|
|
*Note:* Due to the small dataset size, metrics may vary slightly across runs. |
|
|
|
|
|
## Repository Artifacts |
|
|
- `autogluon_predictor.pkl` → cloudpickled predictor (loadable if library versions match) |
|
|
- `autogluon_predictor_dir.zip` → zipped native AutoGluon directory (preferred for portability) |
|
|
|
|
|
## AI Tool Disclosure |
|
|
This notebook used ChatGPT for scaffolding code and documentation. |
|
|
All dataset selection, training, evaluation, and uploads were performed by the student. |
|
|
|