Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion
Researchers compress DeepSeek-685B's search intelligence into Qwen3-4B, cutting costs while preserving 97% performance.
Researchers Minghan Li and Guodong Zhou have introduced a novel framework that dramatically reduces the computational cost of using large language models for search query expansion. Their approach uses a two-stage process: first distilling knowledge from a massive 685B-parameter DeepSeek model into a compact 4B-parameter Qwen3 model, then applying Direct Preference Optimization (DPO) to align the smaller model's outputs with retrieval objectives. This creates a practical solution where the student model achieves 97% of the teacher's nDCG@10 performance on the TREC DL19 benchmark while being orders of magnitude more efficient to run.
The method's innovation lies in its retrieval-feedback-driven approach to creating training data. Rather than relying on human-labeled examples, the system automatically generates preference pairs based on retrieval metric differences (nDCG@10), creating chosen/rejected expansion examples that teach the student model what makes for effective search queries. This automated preference construction allows the model to learn directly from retrieval performance signals, creating a tight feedback loop between query generation and search effectiveness.
Experiments demonstrate strong cross-lingual capabilities, with the distilled model maintaining effectiveness on both English (TREC DL19/20/21) and Chinese (MIRACL-zh) benchmarks. The framework represents a significant step toward making LLM-powered search enhancement practical for production systems, where inference costs and latency are critical constraints. By compressing a 685B model's capabilities into a 4B model with minimal performance loss, the research opens doors for deploying sophisticated query expansion in resource-constrained environments.
- Distills 685B DeepSeek model into 4B Qwen3 model with 97% nDCG@10 performance retention
- Uses automated retrieval-metric-driven preference construction instead of human labeling
- Demonstrates effectiveness across both English (TREC) and Chinese (MIRACL-zh) search benchmarks
Why It Matters
Makes high-quality AI-powered search enhancement practical for production systems by dramatically reducing computational costs.