Research & Papers

Lightweight Query Routing for Adaptive RAG: A Baseline Study on RAGRouter-Bench

A simple TF-IDF + SVM classifier beats semantic embeddings, cutting AI costs by routing queries to the right retrieval strategy.

Deep Dive

Researchers Prakhar Bansal and Shivangi Agarwal have established the first systematic baseline for query routing in Retrieval-Augmented Generation (RAG) systems. Their study, 'Lightweight Query Routing for Adaptive RAG,' tackles a core efficiency problem: RAG pipelines use different retrieval strategies with varying costs and capabilities. Selecting the wrong strategy for a query wastes computational resources (tokens). The team trained and evaluated five classical machine learning classifiers using three feature types—TF-IDF, MiniLM sentence embeddings, and hand-crafted features—on the newly released RAGRouter-Bench. This benchmark contains 7,727 queries across four domains (medical, legal, etc.), each labeled by type: factual, reasoning, or summarization.

The results are surprising for their simplicity. The best-performing configuration was a Support Vector Machine (SVM) using basic TF-IDF (keyword-based) features, achieving a macro-averaged F1 score of 0.928 and 93.2% accuracy. This simple model outperformed classifiers using more modern semantic sentence embeddings by 3.1 F1 points, suggesting surface-level keyword patterns are strong predictors of query complexity. In a cost simulation, this router achieved 28.1% token savings compared to always using the most expensive retrieval paradigm. The study also found domain-specific routing difficulty, with medical queries being the hardest and legal queries the easiest to classify correctly.

This research provides a reproducible, high-performance baseline for adaptive RAG systems. It demonstrates that significant cost savings are achievable with lightweight, interpretable models, challenging the assumption that complex neural networks are always necessary for this task. The performance gap between lexical and semantic features highlights an important direction for future work, as the authors note the need for 'corpus-aware routing' to close the remaining gap. This work gives developers a practical starting point for building more efficient and cost-effective AI applications.

Key Points
  • TF-IDF + SVM model achieved 93.2% accuracy and 28.1% simulated token savings on the RAGRouter-Bench.
  • Simple lexical features (TF-IDF) outperformed semantic sentence embeddings (MiniLM) by 3.1 macro-F1 points.
  • Analysis of 7,727 queries found medical domains hardest and legal domains easiest for query-type routing.

Why It Matters

Enables developers to build cheaper, faster RAG systems by smartly routing user queries, cutting AI inference costs by nearly a third.