Developer Tools

DeepSeek, Qwen, Mistral Code Rewriting Boosts Retrieval by 0.51 NDCG

Full natural language rewriting beats copy-paste for code retrieval, but only when applied together.

Deep Dive

A new paper by Gurioli et al. systematically evaluates LLM-based code rewriting for retrieval, comparing three strategies under two augmentation regimes: joint query-corpus (QC, online) and corpus-only (C, offline). Using five encoders and three LLM families (Qwen, DeepSeek, Mistral) across six CoIR benchmarks, they found that full natural language rewriting under QC gave the largest improvement: +0.51 NDCG@10 on the CT-Contest benchmark for the MoSE-18 encoder. In contrast, corpus-only rewriting backfired—degrading retrieval in 56 out of 90 configurations (about 62%). The authors also introduce two diagnostics: Delta H (token entropy) and Delta s (embedding cosine). Delta H proved a reliable predictor of retrieval gain under QC across all three rewriter families (pooled Spearman rho = +0.436, p < 0.001 on DeepSeek+Codestral; rho = +0.593 on Codestral alone).

The study reframes LLM rewriting as a cost-benefit decision: it works best as a remediation layer for lightweight encoders tackling code-dominant queries, with diminishing returns for stronger encoders or natural-language-heavy queries. The introduction of Delta H provides a low-cost, rewriter-agnostic proxy to decide up front whether rewriting will help—saving LLM calls when gains are unlikely. This research offers practical guidance for improving code retrieval systems without blindly applying LLM rewrites everywhere.

Key Points
  • Full NL rewriting of both queries and corpus gave +0.51 absolute NDCG@10 on CT-Contest for MoSE-18 encoder.
  • Corpus-only rewriting degraded retrieval in 56 of 90 test configurations (62%), showing the risk of offline rewriting.
  • Delta H token entropy correlates with retrieval gain (Spearman rho +0.593 on Codestral), enabling cheap pre-retrieval decision-making.

Why It Matters

LLM rewriting can dramatically improve code search, but blindly rewriting codebases without query context backfires.