Research & Papers

Reproducing and Comparing Distillation Techniques for Cross-Encoders

A comprehensive study reveals how to make smaller AI models match the power of massive LLMs for search.

Deep Dive

A team of researchers from Université de Toulouse and CNRS has published a comprehensive study, 'Reproducing and Comparing Distillation Techniques for Cross-Encoders,' that clarifies the best methods for boosting the performance of smaller AI models used in information retrieval. The research addresses a critical gap by systematically comparing two prominent knowledge distillation strategies: one that distills knowledge from large language model (LLM) re-rankers (Schlatt et al., 2025) and another that uses an ensemble of strong cross-encoder teachers (Hofstätter et al., 2020). By applying these techniques to a wide range of model backbones—from classic BERT and RoBERTa to modern architectures like DeBERTa-v3 and ModernBERT—the study provides a controlled benchmark for the field.

The results offer clear, actionable guidance for AI engineers. The study found that training objectives emphasizing relative comparisons between documents, specifically pairwise MarginMSE and listwise InfoNCE, consistently outperformed simpler pointwise methods across all tested models and datasets (including TREC-DL, MS MARCO, and BEIR). Crucially, the performance gains from choosing the right objective were shown to be as significant as those achieved by scaling up to a larger, more complex model backbone. This work effectively demystifies the 'right strategy' for distillation, proving that with proper technique, efficient cross-encoders can indeed rival the effectiveness of much larger and more computationally expensive LLMs for re-ranking tasks.

Key Points
  • Pairwise MarginMSE and listwise InfoNCE objectives outperformed pointwise baselines across all model architectures tested, including BERT, RoBERTa, and ModernBERT.
  • The performance gain from selecting the optimal distillation objective was found to be comparable to the gain achieved by scaling up the underlying model backbone itself.
  • The study provides a reproducible benchmark, showing efficient cross-encoders can match LLM re-ranker performance, enabling more cost-effective and scalable search systems.

Why It Matters

Enables developers to build high-accuracy search and retrieval systems without the prohibitive cost and latency of running massive LLMs.