Research & Papers

AgentSim: A Platform for Verifiable Agent-Trace Simulation

New open-source tool creates grounded, stepwise agent reasoning data with 100% accuracy...

Deep Dive

AgentSim is an open-source platform designed to generate verifiable, stepwise traces of agent reasoning over any document collection, specifically targeting RAG (retrieval-augmented generation) workflows. Unlike existing datasets that focus on final answers (question-answering), ungrounded reasoning (chain-of-thought), or interface actions (web agents), AgentSim captures the core retrieval and synthesis steps. It employs two key mechanisms: Corpus-Aware Seeding, which ensures agents explore document sets broadly, and Active Validation, a multi-model pipeline that flags steps where models disagree, directing human annotators to the most challenging cases. This approach dramatically improves trace diversity and quality while focusing human effort efficiently.

The accompanying Agent-Trace Corpus (ATC) is a large collection of over 103,000 grounded reasoning steps spanning three established IR benchmarks, achieving a 100% grounding rate on substantive answers. The researchers also conducted a comparative behavioral analysis, revealing systematic differences in how state-of-the-art models approach information seeking. AgentSim is publicly available as a platform, toolkit, and corpus, providing a foundational resource for training more trustworthy agentic LLMs. By grounding reasoning traces in specific documents, AgentSim enables the development of AI agents that can explain their decision-making process step-by-step, a critical capability for high-stakes applications like legal research, medical diagnosis, and scientific discovery.

Key Points
  • AgentSim generates verifiable, stepwise reasoning traces for RAG agents using Corpus-Aware Seeding and Active Validation mechanisms.
  • The Agent-Trace Corpus (ATC) contains over 103,000 grounded reasoning steps across three IR benchmarks with 100% grounding rate.
  • A comparative behavioral analysis reveals systematic differences in how state-of-the-art models approach information seeking tasks.

Why It Matters

Enables training of trustworthy AI agents with transparent, document-grounded reasoning for high-stakes applications.