Research & Papers

SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

New method trains LLMs to plan better reasoning steps without human supervision, boosting complex query performance.

Deep Dive

A team of researchers including Roxana Petcu, Evangelos Kanoulas, and Maarten de Rijke has developed SubSearch, a novel framework that addresses a fundamental challenge in AI: how to get large language models (LLMs) to perform reliable, multi-step reasoning for complex information retrieval tasks. Current approaches often rely on reinforcement learning based solely on final outcomes, or they require expensive human-annotated data to train separate reward models that judge each reasoning step. SubSearch innovates by shifting to "intrinsic process rewards"—internally-derived signals that directly optimize the LLM's generator to plan high-quality reasoning paths, eliminating the need for any external supervision.

Experiments conducted across seven established benchmarks demonstrate that rewarding these intermediate reasoning steps leads to significantly more robust and reliable reasoning traces, particularly in question-answering (QA) and multi-hop QA datasets. This represents a move toward more autonomous, information-intensive reasoning. The practical impact is substantial: SubSearch can help build AI agents that are better at integrating with tools like search engines to answer complex, multi-faceted queries. Furthermore, it provides a more data-efficient pathway compared to supervised process modeling, which could accelerate the development of capable reasoning systems without the bottleneck of human annotation.

Key Points
  • Uses intrinsic intermediate rewards to guide LLM reasoning without external human supervision or separate reward models.
  • Tested on seven benchmarks, showing improved robustness in QA and multi-hop reasoning tasks over outcome-only methods.
  • Enables better agent integration with search tools for complex queries and offers a data-efficient alternative to supervised training.

Why It Matters

This moves AI closer to autonomous, reliable complex reasoning without the high cost and bottleneck of human supervision.