Research & Papers

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

AI reasoning models now know when to ask for help, cutting retrieval calls by 47%

Deep Dive

A team from the University of Hong Kong has published ReaLM-Retrieve, a novel framework designed to align retrieval-augmented generation (RAG) with the multi-step reasoning chains of large reasoning models (LRMs) like DeepSeek-R1 and OpenAI o1. Traditional RAG systems inject all context upfront, but LRMs need evidence at specific points during their extended chains of thought. ReaLM-Retrieve solves this with a step-level uncertainty detector that pinpoints knowledge gaps at reasoning-step granularity, a retrieval intervention policy that decides when external evidence is most beneficial, and an efficiency-optimized integration that reduces per-retrieval overhead by 3.2x compared to naive approaches.

Tested on MuSiQue, HotpotQA, and 2WikiMultiHopQA, ReaLM-Retrieve delivers a 10.1% absolute improvement in answer F1 over standard RAG (ranging from 9.0% to 11.8% across benchmarks). On the challenging MuSiQue dataset requiring 2-4 hop reasoning, it achieves 71.2% F1 with only 1.8 retrieval calls per question. It also cuts retrieval calls by 47% compared to fixed-interval baselines like IRCoT and boosts retrieval quality itself, with 81.3% Recall@5 and higher precision and MRR. Accepted at SIGIR 2026, this work sets a new efficiency-accuracy standard for reasoning-intensive retrieval tasks.

Key Points
  • ReaLM-Retrieve uses a step-level uncertainty detector to identify knowledge gaps during reasoning, not just at token or sentence level.
  • It achieves a 10.1% absolute F1 improvement over standard RAG across MuSiQue, HotpotQA, and 2WikiMultiHopQA.
  • The framework reduces retrieval calls by 47% versus fixed-interval methods like IRCoT and lowers per-retrieval overhead by 3.2x.

Why It Matters

This adaptive retrieval approach makes AI reasoning models smarter and cheaper, enabling complex multi-hop QA with fewer resources.