New DINOSAUR method boosts recommender diversity
A new retrieval method called DINOSAUR samples embeddings to reduce long-tail bias in recommender systems.
Olivier Jeunen from (independent researcher) has introduced DINOSAUR, a novel framework designed to address the long-standing bias in recommender systems toward popular 'head' items at the expense of niche 'long-tail' content. Traditional approximate nearest neighbor (ANN) search methods rely on single point-estimate embeddings for users and items, which are inherently noisy due to sparse interaction data. This noise systematically biases retrieval toward well-defined, popular items while overlooking diverse and serendipitous content.
DINOSAUR tackles this by sampling S_i embeddings per item and constructing an index on this augmented set. At query time, user embeddings are also sampled, creating a two-sided stochastic retrieval process that implicitly marginalizes over embedding uncertainty. Critically, this approach doesn’t require changes to existing model architectures or ANN index infrastructure. Jeunen demonstrates that DINOSAUR recovers standard point-estimate retrieval as uncertainty diminishes, while increased embedding variance expands the retrievable latent space for uncertain items. Empirical results show significant coverage gains with only minor trade-offs in offline recall.
- DINOSAUR samples multiple embeddings per item/user to model uncertainty in recommender systems
- Recovers standard retrieval when uncertainty is low but expands retrievable regions when variance is high
- Achieves large coverage gains with small losses in offline recall in empirical tests
Why It Matters
Could revolutionize recommender systems by reducing bias toward popular items and surfacing more diverse content.