hw2_classical_automl / README.md

george2cool36

Update README.md

3816440 verified 3 months ago

preview code

raw

history blame contribute delete

2.84 kB

metadata

license: mit
library_name: autogluon
pipeline_tag: tabular-classification
datasets:
  - scottymcgee/flowers
model-index:
  - name: hw2_classical_automl
    results:
      - task:
          type: tabular-classification
          name: Tabular Classification
        dataset:
          name: scottymcgee/flowers
          type: scottymcgee/flowers
          split: test
        metrics:
          - name: accuracy
            type: accuracy
            value: 0.87
          - name: f1_macro
            type: f1
            value: 0.84

HW2 Classical AutoML — AutoGluon TabularPredictor

Model Overview

This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679.
It predicts the target column (color) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height).

The workflow demonstrates how classical AutoML can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning.

Dataset

Source: Scotty’s HW1 tabular dataset on Hugging Face (scottymcgee/flowers)
Samples: ~30 original samples, expanded via augmentation
Features: numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
Target: color (multiclass, 6 possible values)
Split: 80% training, 20% validation

Training Configuration

Framework: AutoGluon TabularPredictor
Presets: medium_quality (balanced speed vs. accuracy)
Problem Type: multiclass classification
Time Limit: 600 seconds (10 minutes)
Random Seed: 42 (for reproducible train/val split)
Hardware: Google Colab CPU/GPU runtime

AutoGluon automatically handled:

Standardization of numeric features
Encoding of categorical features (none in this dataset)
Model ensembling and stacking

Results

Best model: Reported by AutoGluon leaderboard
Validation Metric (Weighted F1): ~0.9 (exact value depends on random seed / run)
Leaderboard: includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM

Note: Due to the small dataset size, metrics may vary slightly across runs.

Repository Artifacts

autogluon_predictor.pkl → cloudpickled predictor (loadable if library versions match)
autogluon_predictor_dir.zip → zipped native AutoGluon directory (preferred for portability)

AI Tool Disclosure

This notebook used ChatGPT for scaffolding code and documentation. All dataset selection, training, evaluation, and uploads were performed by the student.