Research & Papers

UMass Amherst researchers’ Critic-R boosts agentic search with introspective feedback

A new framework lets retrieval models self-correct using natural language critiques

Deep Dive

Agentic search systems that iteratively query retrieval models struggle with optimization, often requiring expensive co-training or gold-standard annotations. To address this, researchers from UMass Amherst (Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani) propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retriever. Critic-R introduces a trained critic model that examines the agent’s introspective reasoning trace after consuming retrieved evidence, determining whether the context sufficiently supports the next step. This natural language feedback enables the system to self-correct without human intervention.

Critic-R comprises two complementary mechanisms. Critic-R-Zero operates at inference time, iteratively rewriting queries and retrieval instructions based on the critic’s feedback. Critic-Embed optimizes retrieval model embeddings by using successful and failed refinement trajectories as automatic supervision—eliminating the need for manual relevance annotations. Evaluated on four multi-hop QA benchmarks (HotpotQA, 2WikiMultihopQA, MuSiQue, Bamboogle), Critic-R significantly improves both retrieval quality and downstream answer accuracy. The work points toward more autonomous, efficient agentic search pipelines that learn from their own mistakes.

Key Points
  • Critic-R uses a trained critic model to evaluate retrieved evidence via natural language introspective feedback
  • Critic-R-Zero refines queries at inference time; Critic-Embed trains retrieval models without manual relevance labels
  • Outperforms baselines on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle in retrieval quality and answer accuracy

Why It Matters

Self-improving retrieval systems that learn from their mistakes reduce reliance on expensive human annotations in agentic AI.