george2cool36
/

hw2_classical_automl

Tabular Classification

Eval Results (legacy)

Model card Files Files and versions

hw2_classical_automl / README.md

george2cool36's picture

Update README.md

3816440 verified 5 months ago

|

history blame contribute delete

2.84 kB


	---
	license: mit
	library_name: autogluon
	pipeline_tag: tabular-classification
	datasets:
	- scottymcgee/flowers
	model-index:
	- name: hw2_classical_automl
	results:
	- task:
	type: tabular-classification
	name: Tabular Classification
	dataset:
	name: scottymcgee/flowers
	type: scottymcgee/flowers
	split: test
	metrics:
	- name: accuracy
	type: accuracy
	value: 0.87 # <-- replace with your number
	- name: f1_macro
	type: f1
	value: 0.84 # <-- replace with your number
	---

	# HW2 Classical AutoML — AutoGluon TabularPredictor

	## Model Overview
	This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679.
	It predicts the target column (`color`) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height).

	The workflow demonstrates how classical AutoML can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning.

	## Dataset
	- Source: Scotty’s HW1 tabular dataset on Hugging Face (`scottymcgee/flowers`)
	- Samples: ~30 original samples, expanded via augmentation
	- Features: numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
	- Target: `color` (multiclass, 6 possible values)
	- Split: 80% training, 20% validation

	## Training Configuration
	- Framework: AutoGluon `TabularPredictor`
	- Presets: `medium_quality` (balanced speed vs. accuracy)
	- Problem Type: `multiclass` classification
	- Time Limit: 600 seconds (10 minutes)
	- Random Seed: 42 (for reproducible train/val split)
	- Hardware: Google Colab CPU/GPU runtime

	AutoGluon automatically handled:
	- Standardization of numeric features
	- Encoding of categorical features (none in this dataset)
	- Model ensembling and stacking

	## Results
	- Best model: Reported by AutoGluon leaderboard
	- Validation Metric (Weighted F1): ~0.9 (exact value depends on random seed / run)
	- Leaderboard: includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM

	Note: Due to the small dataset size, metrics may vary slightly across runs.

	## Repository Artifacts
	- `autogluon_predictor.pkl` → cloudpickled predictor (loadable if library versions match)
	- `autogluon_predictor_dir.zip` → zipped native AutoGluon directory (preferred for portability)

	## AI Tool Disclosure
	This notebook used ChatGPT for scaffolding code and documentation.
	All dataset selection, training, evaluation, and uploads were performed by the student.