Database includes 11,717 expert-annotated samples from state-of-the-art models?

Database includes 11,717 expert-annotated samples from state-of-the-art models

Achieves high correlation with professional human ratings for diagnostic benchmarking?

Achieves high correlation with professional human ratings for diagnostic benchmarking

Audio & Speech

SongBench: New benchmark rates AI songs across 7 professional dimensions

arXiv eess.AS April 30, 2026

⚡11,717 expert-labeled samples reveal gaps in current text-to-song models...

Deep Dive

Recent advances in Text-to-Song generation have produced realistic musical outputs, but existing evaluation benchmarks lack the professional granularity to assess multi-dimensional aesthetic nuances. To address this, researchers from multiple institutions introduce SongBench, a specialized framework for fine-grained song quality assessment. SongBench evaluates across seven key dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality. The team constructed an expert-annotated database comprising 11,717 samples from state-of-the-art models, each labeled by music professionals. Extensive experiments show SongBench achieves high correlation with expert ratings, effectively diagnosing strengths and weaknesses in current models.

By revealing fine-grained performance gaps in state-of-the-art systems, SongBench serves as a diagnostic benchmark to guide future development toward more professional and musically coherent song generation. This tool is particularly valuable for researchers and developers working on AI music generation, offering a standardized way to measure and compare model quality beyond simple subjective listening. The benchmark's multi-aspect approach ensures that improvements in one area (e.g., melody) don't come at the expense of others (e.g., mixing), promoting holistic advancement in AI music creation.

Key Points

Evaluates songs across 7 dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality
Database includes 11,717 expert-annotated samples from state-of-the-art models
Achieves high correlation with professional human ratings for diagnostic benchmarking

Why It Matters

SongBench provides a standardized, professional-grade metric to accelerate improvement in AI music generation quality.

Read Original Article

SongBench: New benchmark rates AI songs across 7 professional dimensions

Why It Matters

Related Articles

🚀 Stay Ahead in AI