Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com
67% of popular IKEA searches end with zero clicks—new method aims to fix that.
IKEA researchers Eva Agapaki and Amritpal Singh Gill have introduced a systematic approach to improving dense retrieval for product search using contrastive learning with structured negative sampling and scalable LLM-as-a-judge relevance evaluation. Their method, detailed in a paper on arXiv, leverages IKEA's product hierarchical taxonomy and attributes to generate semantically challenging negatives, rather than relying on sparse human annotations or random sampling. The LLM-based evaluation system allocates a relevance score for all candidate products against each query, creating high-quality training data.
Offline experiments on real user queries from the Canada market achieved a +2.6% average category accuracy improvement over baseline. However, an A/B test on long-tail queries showed no statistically significant differences in user engagement metrics (p > 0.05). The researchers traced this gap to user search behavior: 67% of popular searches exhibit zero-click rates above 50%, meaning a substantial proportion of search sessions result in no product engagement regardless of ranking quality. The findings underscore the importance of hard negative mining while emphasizing the need to ground training data and offline evaluations in real user search behavior, including query intent distribution and zero-click patterns, to bridge the gap between offline retrieval quality and online user engagement.
- IKEA researchers used LLM-as-a-judge to score candidate products for training data generation, replacing sparse human annotations.
- Structured negative sampling using product taxonomy and attributes boosted offline category accuracy by +2.6% on Canada market queries.
- A/B tests on long-tail queries showed no significant engagement improvement, linked to 67% of popular searches having >50% zero-click rates.
Why It Matters
Real-world search systems must account for zero-click behavior—better offline metrics don't always translate to user engagement.