Research & Papers

Naver Labs Europe @ WSDM CUP | Multilingual Retrieval

Their learned sparse retrieval model outperformed dense baselines in the WSDM Cup 2026 benchmark.

Deep Dive

Researchers from Naver Labs Europe have detailed their winning approach in the WSDM Cup 2026 shared task on multilingual document retrieval from English queries. The team, led by Thibault Formal and Maxime Louis, used the competition as a testbed for their SPLARE model, a learned sparse retrieval system designed to produce generalizable sparse latent representations. Their submission, which involved five progressively enhanced runs, successfully demonstrated that their method could outperform established dense retrieval baselines, challenging the prevailing trend in the field.

The core of their solution was the SPLARE-7B model, which they augmented with lightweight improvements including reranking using the Qwen3-Reranker-4B model and simple score fusion strategies. The results proved the continued relevance and competitiveness of learned sparse retrieval models in complex, multilingual scenarios, where they beat models like Qwen3-8B-Embed. This work provides a significant benchmark for cross-lingual generalization and highlights an efficient, high-performance alternative to dense vector-based search for enterprise-scale information retrieval systems.

Key Points
  • SPLARE-7B model outperformed dense baseline Qwen3-8B-Embed in the WSDM Cup 2026 benchmark.
  • Solution used a hybrid approach with Qwen3-Reranker-4B and score fusion for final performance boost.
  • Proves learned sparse retrieval remains a competitive, efficient architecture for multilingual search tasks.

Why It Matters

Offers a high-performance, efficient alternative to dense embedding models for building scalable, multilingual search systems.