Scaling Multilingual Semantic Search in Uber Eats Delivery
The system fine-tunes a Qwen2 model on hundreds of millions of anonymized user interactions.
A team of Uber engineers has detailed a new production-scale semantic search system for the Uber Eats platform. The system is designed to unify retrieval across three distinct verticals: restaurant stores, individual dishes, and grocery/retail items, all within a single multilingual framework. To achieve this, the team fine-tuned a Qwen2 two-tower base model, training it on a massive dataset of hundreds of millions of anonymized user query-document interactions. The training leveraged a combination of InfoNCE loss on in-batch negatives and triplet-NCE loss on hard negatives to improve ranking accuracy.
A key technical innovation is the use of Matryoshka Representation Learning (MRL), which allows the single trained model to generate embeddings of different sizes (e.g., 64, 128, 256 dimensions). This provides flexibility for balancing search quality against computational cost and latency in different production scenarios. The paper reports that the system achieved substantial gains in recall—a core metric for retrieval quality—over a strong baseline across six different geographic markets. The work provides a comprehensive blueprint covering data curation, model architecture, large-scale training, and evaluation for building a unified, multi-vertical search system at a global consumer scale.
- Unifies search across stores, dishes, and grocery items in a single multilingual model.
- Fine-tunes a Qwen2 model on hundreds of millions of anonymized user interactions.
- Uses Matryoshka Representation Learning (MRL) to serve multiple embedding sizes from one model for efficiency.
Why It Matters
It demonstrates how to scale high-quality, unified semantic search for a complex, global marketplace, improving discovery for millions of users.