Learning to Retrieve from Agent Trajectories
New research shows AI agents use search differently than humans, requiring a new training paradigm.
A team from Renmin University of China and Microsoft has published a pivotal paper, 'Learning to Retrieve from Agent Trajectories,' introducing a new paradigm for training information retrieval (IR) systems. The core argument is that traditional IR models, trained on human interaction data like clicks and dwell time, are fundamentally mismatched for the era of AI-powered search agents. These agents—LLMs that perform multi-step reasoning and actions—issue queries and consume results in entirely different patterns. The paper proposes that retrieval models must be trained directly on agent behavior data to close this performance gap.
To operationalize this, the researchers developed LRAT (Learning to Retrieve from Agent Trajectories), a framework that mines high-quality training signals from the complete trajectories of AI agents. It identifies key behavioral signals that reveal a document's true utility to an agent, such as which documents an agent browses in-depth, which it rejects without reading, and the reasoning traces it generates after consuming information. LRAT uses these signals to create a weighted optimization objective that captures 'relevance intensity,' not just binary relevance.
Extensive testing on deep research benchmarks demonstrates the framework's effectiveness. Retrievers trained with LRAT consistently outperformed traditional models, showing significant improvements in evidence recall, end-to-end task success rates, and execution efficiency. These gains held across different agent architectures and model scales, proving the approach is robust and scalable. The work establishes agent trajectories as a practical and rich source of supervision, pointing the way toward next-generation retrieval systems built for an AI-native world.
- LRAT framework trains IR models using AI agent interaction data, not human clicks, addressing a fundamental mismatch in agentic search.
- The system mines supervision from agent trajectories, analyzing signals like browsing actions, unbrowsed rejections, and post-browse reasoning traces.
- Experiments show LRAT improves evidence recall by up to 15% and boosts end-to-end task success across diverse agent architectures.
Why It Matters
This research is foundational for building effective AI agents that can reliably search, reason, and complete complex tasks, moving beyond systems designed for human users.