Developer Tools

Introducing the agent quality loop: AgentCore Optimization now in preview

New optimization preview uses production traces to auto-generate and validate agent improvements.

Deep Dive

Amazon Bedrock AgentCore has announced a new Optimization preview that establishes a complete observe-evaluate-improve loop for AI agents. The feature tackles the common problem of agent quality degradation after launch—caused by shifting user behavior, evolving models, and reused prompts. Traditionally, teams relied on manual trace analysis and intuition-driven prompt rewrites, often introducing new issues. AgentCore’s new capabilities automate this cycle: Recommendations (via API) analyze production traces stored in CloudWatch Logs and optimize either the system prompt or tool descriptions to maximize a chosen reward signal (e.g., built-in evaluators like goal success rate or custom LLM-as-judge scores).

Validation is handled in two ways. Batch evaluation tests recommendations against a predefined test dataset—or against a simulated dataset generated by an LLM-backed actor playing the end-user role—reporting aggregate scores to catch regressions. A/B testing then runs through AgentCore Gateway, splitting live production traffic at configurable percentages and reporting results with confidence intervals and statistical significance. This loop replaces the slow, manual cycle with systematic data-backed evidence. As NTT DATA’s Yoshiharu Okuda noted, processes that once required weeks of manual tuning now become rapid, repeatable cycles. AgentCore’s end-to-end traceability (OpenTelemetry-compatible) captures every model call, tool invocation, and reasoning step, enabling continuous optimization at scale for thousands of developers already using the platform.

Key Points
  • Recommendations API analyzes production traces in CloudWatch Logs to optimize system prompts or tool descriptions based on a chosen reward signal (built-in or custom evaluator).
  • Batch evaluation validates changes against predefined test datasets or simulated LLM-backed actor datasets, catching regressions before deployment.
  • A/B testing via AgentCore Gateway splits live traffic between agent versions with configurable percentages and reports results with statistical significance.

Why It Matters

Automates the tedious manual agent optimization cycle, enabling continuous data-driven improvements at scale for production AI agents.