Research & Papers

Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

New research reveals a utility gap between retrieval and generation in RAG systems.

Deep Dive

A new study by Negar Arabzadeh, Andrew Drozdov, Michael Bendersky, and Matei Zaharia investigates whether Query Performance Prediction (QPP) can select the best query variant in Retrieval-Augmented Generation (RAG) pipelines. LLMs often generate multiple semantically equivalent queries, but running the full pipeline for each is costly. The researchers propose using QPP to identify the optimal variant before retrieval and generation, focusing on intra-topic discrimination rather than traditional cross-topic difficulty estimation.

Through large-scale experiments on TREC-RAG with both sparse and dense retrievers, the team evaluated pre- and post-retrieval predictors. They uncovered a systematic 'utility gap': variants that maximize ranking metrics like nDCG often fail to produce the best generated answers, highlighting misalignment between retrieval relevance and generation fidelity. However, QPP can reliably identify variants that improve end-to-end quality over the original query. Notably, lightweight pre-retrieval predictors frequently match or outperform more expensive post-retrieval methods, offering a latency-efficient approach for robust RAG systems.

Key Points
  • QPP can select optimal query variants before running full RAG pipelines, reducing computational costs.
  • A 'utility gap' exists where retrieval metrics (e.g., nDCG) don't align with generation quality in RAG.
  • Lightweight pre-retrieval predictors match or beat post-retrieval methods, enabling faster RAG optimization.

Why It Matters

Enables faster, cheaper RAG by predicting query quality, improving real-time AI applications.