π΅ Music Genre Classification using AST
π Model Overview
This model is a fine-tuned Audio Spectrogram Transformer (AST) for music genre classification.
It predicts one of the following 10 genres:
- blues, classical, country, disco, hiphop
- jazz, metal, pop, reggae, rock
π§ Architecture
- Base Model: MIT/ast-finetuned-audioset
- Type: Transformer (Audio Spectrogram Transformer)
- Framework: PyTorch + Hugging Face Transformers
π― Task
Audio Classification (Music Genre Classification)
π Dataset
- Training Data: Clean instrument stems (drums, vocals, bass, others)
- Test Data: Noisy mashups with:
- cross-song mixing
- tempo variation
- ESC-50 noise injection
βοΈ Preprocessing
- Sampling rate: 16kHz
- Fixed duration: 15 seconds
- Padding / truncation applied
- Feature extraction using ASTFeatureExtractor
π Performance
- Metric: Macro F1 Score
- Achieved: 0.87
π Live Demo
Try the model here:
π https://huggingface.co/spaces/msaligs/music-genre-classifier
π Usage
from transformers import ASTForAudioClassification, ASTFeatureExtractor
model = ASTForAudioClassification.from_pretrained("msaligs/ast_fine_tuned_music_genre_10")
feature_extractor = ASTFeatureExtractor.from_pretrained("msaligs/ast_fine_tuned_music_genre_10")
- Downloads last month
- 6
Evaluation results
- f1 on Messy Mashup Datasetself-reported0.XX