SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment
11,717 expert-labeled samples reveal gaps in current text-to-song models...
Recent advances in Text-to-Song generation have produced realistic musical outputs, but existing evaluation benchmarks lack the professional granularity to assess multi-dimensional aesthetic nuances. To address this, researchers from multiple institutions introduce SongBench, a specialized framework for fine-grained song quality assessment. SongBench evaluates across seven key dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality. The team constructed an expert-annotated database comprising 11,717 samples from state-of-the-art models, each labeled by music professionals. Extensive experiments show SongBench achieves high correlation with expert ratings, effectively diagnosing strengths and weaknesses in current models.
By revealing fine-grained performance gaps in state-of-the-art systems, SongBench serves as a diagnostic benchmark to guide future development toward more professional and musically coherent song generation. This tool is particularly valuable for researchers and developers working on AI music generation, offering a standardized way to measure and compare model quality beyond simple subjective listening. The benchmark's multi-aspect approach ensures that improvements in one area (e.g., melody) don't come at the expense of others (e.g., mixing), promoting holistic advancement in AI music creation.
- Evaluates songs across 7 dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality
- Database includes 11,717 expert-annotated samples from state-of-the-art models
- Achieves high correlation with professional human ratings for diagnostic benchmarking
Why It Matters
SongBench provides a standardized, professional-grade metric to accelerate improvement in AI music generation quality.