Research & Papers

Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify

Spotify's new generative AI recommender increases new-show discovery by 14.3% while meeting strict production latency constraints.

Deep Dive

A team of 43 Spotify researchers has detailed GLIDE, a novel generative AI system for podcast recommendation that moves beyond traditional collaborative filtering. The core innovation is formulating recommendation as an instruction-following task over a catalog encoded with Semantic IDs—discrete, hierarchical tokens that represent podcast content. This approach grounds a large language model (LLM) in Spotify's actual inventory, allowing it to generate relevant podcast IDs directly. The model cleverly blends signals by conditioning on a user's recent listening history for context and injecting long-term user embeddings as soft prompts to maintain personalization, all while operating within the strict latency and cost constraints of a production service.

Deploying such a semantically-aware model at Spotify's scale presented significant challenges, including catalog grounding, personalization, and serving speed. The team addressed these by using the Semantic ID framework, which enables efficient generation over millions of items. They evaluated GLIDE using offline metrics, human judgments, LLM-based evaluation, and ultimately, large-scale online A/B testing. The results were substantial: across experiments involving millions of active users, GLIDE successfully shifted behavior, increasing streaming of non-habitual podcasts on the home surface by up to 5.4% and driving discovery of entirely new shows by up to 14.3%. This demonstrates a successful real-world application of generative retrieval, where an LLM doesn't just rank items but actively generates personalized, context-aware recommendations.

Key Points
  • GLIDE formulates podcast recommendation as an instruction-following task for an LLM over a catalog of Semantic IDs, enabling grounded generation.
  • The system increased new-show discovery by 14.3% and non-habitual podcast streaming by 5.4% in A/B tests with millions of Spotify users.
  • It blends short-term context (recent listens) with long-term user preferences via soft prompts, operating within production latency and cost limits.

Why It Matters

Proves generative AI can effectively power discovery at scale, moving beyond simple ranking to create intent-aware, explorative recommendations for users.