--- license: mit library_name: autogluon pipeline_tag: tabular-classification datasets: - scottymcgee/flowers model-index: - name: hw2_classical_automl results: - task: type: tabular-classification name: Tabular Classification dataset: name: scottymcgee/flowers type: scottymcgee/flowers split: test metrics: - name: accuracy type: accuracy value: 0.87 # <-- replace with your number - name: f1_macro type: f1 value: 0.84 # <-- replace with your number --- # HW2 Classical AutoML — AutoGluon TabularPredictor ## Model Overview This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679. It predicts the **target column** (`color`) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height). The workflow demonstrates how **classical AutoML** can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning. ## Dataset - **Source:** Scotty’s HW1 tabular dataset on Hugging Face (`scottymcgee/flowers`) - **Samples:** ~30 original samples, expanded via augmentation - **Features:** numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm) - **Target:** `color` (multiclass, 6 possible values) - **Split:** 80% training, 20% validation ## Training Configuration - **Framework:** AutoGluon `TabularPredictor` - **Presets:** `medium_quality` (balanced speed vs. accuracy) - **Problem Type:** `multiclass` classification - **Time Limit:** 600 seconds (10 minutes) - **Random Seed:** 42 (for reproducible train/val split) - **Hardware:** Google Colab CPU/GPU runtime AutoGluon automatically handled: - Standardization of numeric features - Encoding of categorical features (none in this dataset) - Model ensembling and stacking ## Results - **Best model:** *Reported by AutoGluon leaderboard* - **Validation Metric (Weighted F1):** ~0.9 (exact value depends on random seed / run) - **Leaderboard:** includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM *Note:* Due to the small dataset size, metrics may vary slightly across runs. ## Repository Artifacts - `autogluon_predictor.pkl` → cloudpickled predictor (loadable if library versions match) - `autogluon_predictor_dir.zip` → zipped native AutoGluon directory (preferred for portability) ## AI Tool Disclosure This notebook used ChatGPT for scaffolding code and documentation. All dataset selection, training, evaluation, and uploads were performed by the student.