A Unified Language Model for Large Scale Search, Recommendation, and Reasoning
A single AI model from Spotify researchers handles a 10M-item catalog for search, recommendations, and reasoning without external tools.
A large research team, including scientists from Spotify, has published a paper introducing NEO, a novel framework designed to solve a major challenge in applying large language models (LLMs) to real-world platforms. Current systems struggle to deploy a single, end-to-end model that can handle search, recommendation, and reasoning tasks over massive, heterogeneous catalogs. Tool-augmented systems add complexity, while text-only generation fails to reliably reference specific real-world items. NEO addresses this by adapting a pre-trained decoder-only LLM into a 'catalog-grounded generator' that operates without external tools.
NEO's core innovation is its treatment of catalog items as a distinct modality. It represents items as Structured IDs (SIDs) and trains a single model to seamlessly interleave natural language and these typed item identifiers within a shared sequence. This allows text prompts to control the task, target entity type, and output format—whether it's generating pure text, specific item IDs, or a mix of both. The researchers term this fine-grained control 'language-steerability.' A key technical feature is constrained decoding, which guarantees the model only generates valid catalog items without restricting its ability to produce free-form text.
The framework was rigorously evaluated at scale on a real-world catalog containing over 10 million items across multiple media types. In offline experiments for discovery tasks like recommendation and search, NEO consistently outperformed strong task-specific baselines. Crucially, it also demonstrated cross-task transfer, meaning improvements in one area (like search) benefited performance in another (like recommendation). This shows a practical path toward consolidating fragmented, large-scale discovery systems into a single, more efficient, and controllable generative model, potentially simplifying architecture and improving performance for major digital platforms.
- NEO trains a single LLM to handle search, recommendation, and reasoning by interleaving natural language with catalog item IDs (SIDs).
- The model was tested on a massive real-world catalog of over 10 million items, outperforming task-specific baselines.
- Its 'language-steerable' design allows text prompts to control the task and output format, enabling tool-free, catalog-grounded generation.
Why It Matters
This research points toward simpler, more powerful AI architectures for major platforms like Spotify or Amazon, combining multiple discovery systems into one efficient model.