---
license: mit
library_name: autogluon
pipeline_tag: tabular-classification
datasets:
  - scottymcgee/flowers  
model-index:
  - name: hw2_classical_automl
    results:
      - task:
          type: tabular-classification
          name: Tabular Classification
        dataset:
          name: scottymcgee/flowers  
          type: scottymcgee/flowers  
          split: test
        metrics:
          - name: accuracy
            type: accuracy
            value: 0.87    # <-- replace with your number
          - name: f1_macro
            type: f1
            value: 0.84    # <-- replace with your number
---

# HW2 Classical AutoML — AutoGluon TabularPredictor

## Model Overview
This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679.  
It predicts the **target column** (`color`) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height).  

The workflow demonstrates how **classical AutoML** can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning.  

## Dataset
- **Source:** Scotty’s HW1 tabular dataset on Hugging Face (`scottymcgee/flowers`)  
- **Samples:** ~30 original samples, expanded via augmentation  
- **Features:** numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)  
- **Target:** `color` (multiclass, 6 possible values)  
- **Split:** 80% training, 20% validation  

## Training Configuration
- **Framework:** AutoGluon `TabularPredictor`  
- **Presets:** `medium_quality` (balanced speed vs. accuracy)  
- **Problem Type:** `multiclass` classification  
- **Time Limit:** 600 seconds (10 minutes)  
- **Random Seed:** 42 (for reproducible train/val split)  
- **Hardware:** Google Colab CPU/GPU runtime  

AutoGluon automatically handled:
- Standardization of numeric features  
- Encoding of categorical features (none in this dataset)  
- Model ensembling and stacking  

## Results
- **Best model:** *Reported by AutoGluon leaderboard*  
- **Validation Metric (Weighted F1):** ~0.9 (exact value depends on random seed / run)  
- **Leaderboard:** includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM  

*Note:* Due to the small dataset size, metrics may vary slightly across runs.  

## Repository Artifacts
- `autogluon_predictor.pkl` → cloudpickled predictor (loadable if library versions match)  
- `autogluon_predictor_dir.zip` → zipped native AutoGluon directory (preferred for portability)

## AI Tool Disclosure
This notebook used ChatGPT for scaffolding code and documentation.
All dataset selection, training, evaluation, and uploads were performed by the student.