Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models
New framework leverages self-supervised music models for 4x better cold-start recommendations
A team of researchers from Nanjing University has released TASTE, a comprehensive dataset and benchmarking framework designed to push content-based music recommendation beyond traditional collaborative filtering. The dataset integrates both raw audio signals and descriptive textual metadata, addressing a key gap in existing music recommendation benchmarks that lack rich multimodal information. By leveraging recent large-scale self-supervised music encoders, TASTE demonstrates that extracted audio representations significantly improve performance in both candidate recall and click-through rate (CTR) tasks.
The paper also introduces MuQ-token, a novel feature aggregation method that efficiently combines multi-layer audio features from pre-trained music models. MuQ-token consistently outperforms other integration techniques across various settings, providing a reusable multimodal foundation for future research. The work is particularly impactful for cold-start scenarios, where collaborative filtering fails due to limited user interaction data. The code is publicly available, enabling other researchers and streaming platforms to adopt these content-driven approaches for more accurate and personalized music recommendations.
- TASTE dataset integrates raw audio signals and textual metadata for multimodal music recommendation
- MuQ-token aggregates multi-layer audio features from self-supervised music encoders, outperforming other methods
- Content-based approach improves cold-start recommendations where collaborative filtering struggles
Why It Matters
Content-based music recommendation could reduce cold-start failures and improve personalization for new users and songs.