Audio & Speech

Evaluating Pretrained General-Purpose Audio Representations for Music Genre Classification

arXiv eess.AS March 17, 2026

⚡New research shows self-supervised BYOL-A embeddings outperform PANNs and VGGish for audio analysis.

Deep Dive

A new study presented at the International Conference on Pattern Recognition and Machine Intelligence (PReMI) 2025 provides a comprehensive benchmark for using pretrained, general-purpose audio AI models to classify music genres. Researchers Kashish Rai and Mrinmoy Bhattacharjee systematically evaluated embeddings from self-supervised learning models like BYOL-A against established models such as PANNs and VGGish. Their key finding is that BYOL-A embeddings, when processed by a custom deep neural network classifier, deliver superior performance, achieving 81.5% accuracy on the standard GTZAN dataset and 64.3% on the more challenging FMA-Small dataset.

The proposed DNN architecture was a significant factor, boosting accuracy by 10-16% compared to using basic linear classifiers on the same embeddings. The researchers also tackled the challenge of cross-dataset generalization by creating a unified 18-class label space from GTZAN and FMA-Small for joint training. While this caused a slight performance drop on GTZAN, it yielded comparable results on FMA-Small, demonstrating a more robust model. All scripts from this work are publicly available, offering a practical toolkit for developers and researchers looking to implement state-of-the-art audio classification without training models from scratch.

Key Points

BYOL-A embeddings outperformed PANNs and VGGish, scoring 81.5% accuracy on GTZAN and 64.3% on FMA-Small.
A custom Deep Neural Network classifier provided a 10-16% performance boost over standard linear classifiers.
The study addressed cross-dataset challenges by unifying GTZAN and FMA-Small into an 18-class label space for joint training.

Why It Matters

This benchmark enables more accurate AI for music streaming recommendations, content tagging, and audio analysis tools.

Read Original Article

Evaluating Pretrained General-Purpose Audio Representations for Music Genre Classification

Why It Matters

Stay Ahead in AI