ToolSimulator: scalable tool testing for AI agents
Uses LLMs to simulate real API responses, eliminating risks from live calls and static mocks.
Strands has introduced ToolSimulator, a new framework within its Evals SDK designed to solve a critical bottleneck in AI agent development: safe and scalable testing. Modern agents that call APIs, query databases, or interact with external systems have traditionally required testing against live services, which is slow, risky, and exposes sensitive data. Static mock responses fail to capture the stateful, multi-turn nature of real workflows. ToolSimulator uses a large language model (LLM) to simulate tool behavior, generating adaptive and context-aware responses without ever touching a production system.
The framework offers three core capabilities. First, its adaptive response generation creates plausible, request-specific outputs (like realistic flight options) instead of generic placeholders. Second, it maintains consistent shared state across tool calls, enabling accurate testing of multi-step processes like booking workflows. Third, it enforces response schemas using Pydantic models, catching malformed data before it reaches the agent. Available now in the Strands Evals SDK, ToolSimulator aims to help developers comprehensively test edge cases, isolate their tests from external dependencies, and ship reliable agents with greater confidence.
- Simulates API tools with an LLM to avoid risky live calls and data exposure.
- Maintains state across multi-turn workflows, unlike static mocks that break with complex agents.
- Enforces response structure with Pydantic schemas to catch integration bugs early.
Why It Matters
Enables faster, safer development cycles for production AI agents by removing testing bottlenecks and security risks.