Developer Tools

ToolSimulator: scalable tool testing for AI agents

AWS Machine Learning Blog April 21, 2026

⚡Uses LLMs to simulate real API responses, eliminating risks from live calls and static mocks.

Deep Dive

Strands has introduced ToolSimulator, a new framework within its Evals SDK designed to solve a critical bottleneck in AI agent development: safe and scalable testing. Modern agents that call APIs, query databases, or interact with external systems have traditionally required testing against live services, which is slow, risky, and exposes sensitive data. Static mock responses fail to capture the stateful, multi-turn nature of real workflows. ToolSimulator uses a large language model (LLM) to simulate tool behavior, generating adaptive and context-aware responses without ever touching a production system.

The framework offers three core capabilities. First, its adaptive response generation creates plausible, request-specific outputs (like realistic flight options) instead of generic placeholders. Second, it maintains consistent shared state across tool calls, enabling accurate testing of multi-step processes like booking workflows. Third, it enforces response schemas using Pydantic models, catching malformed data before it reaches the agent. Available now in the Strands Evals SDK, ToolSimulator aims to help developers comprehensively test edge cases, isolate their tests from external dependencies, and ship reliable agents with greater confidence.

Key Points

Simulates API tools with an LLM to avoid risky live calls and data exposure.
Maintains state across multi-turn workflows, unlike static mocks that break with complex agents.
Enforces response structure with Pydantic schemas to catch integration bugs early.

Why It Matters

Enables faster, safer development cycles for production AI agents by removing testing bottlenecks and security risks.

Read Original Article

ToolSimulator: scalable tool testing for AI agents

Why It Matters

Stay Ahead in AI