🎡 Music Genre Classification using AST

πŸ“Œ Model Overview

This model is a fine-tuned Audio Spectrogram Transformer (AST) for music genre classification.

It predicts one of the following 10 genres:

  • blues, classical, country, disco, hiphop
  • jazz, metal, pop, reggae, rock

🧠 Architecture

  • Base Model: MIT/ast-finetuned-audioset
  • Type: Transformer (Audio Spectrogram Transformer)
  • Framework: PyTorch + Hugging Face Transformers

🎯 Task

Audio Classification (Music Genre Classification)


πŸ“Š Dataset

  • Training Data: Clean instrument stems (drums, vocals, bass, others)
  • Test Data: Noisy mashups with:
    • cross-song mixing
    • tempo variation
    • ESC-50 noise injection

βš™οΈ Preprocessing

  • Sampling rate: 16kHz
  • Fixed duration: 15 seconds
  • Padding / truncation applied
  • Feature extraction using ASTFeatureExtractor

πŸ“ˆ Performance

  • Metric: Macro F1 Score
  • Achieved: 0.87

πŸ”— Live Demo

Try the model here:
πŸ‘‰ https://huggingface.co/spaces/msaligs/music-genre-classifier

πŸš€ Usage

from transformers import ASTForAudioClassification, ASTFeatureExtractor

model = ASTForAudioClassification.from_pretrained("msaligs/ast_fine_tuned_music_genre_10")
feature_extractor = ASTFeatureExtractor.from_pretrained("msaligs/ast_fine_tuned_music_genre_10")
Downloads last month
6
Safetensors
Model size
86.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results