Developer Tools

Introducing OS Level Actions in Amazon Bedrock AgentCore Browser

AI agents can now click system dialogs, not just web elements.

Deep Dive

Amazon Web Services has introduced OS Level Actions for its AgentCore Browser, a secure browser environment designed for AI agents automating web workflows. Previously, agents relied on Playwright and Chrome DevTools Protocol (CDP) to interact with the Document Object Model (DOM), which works for most web automation but hits a hard boundary at operating system—rendered UI: native dialogs, security prompts, certificate choosers, context menus, and even Chrome settings. These elements are invisible to CDP and unactionable by Playwright, causing production failures. The new OS Level Actions, accessible through the InvokeBrowser API, expose direct mouse and keyboard control at the OS level, allowing agents to interact with any content visible on the screen—not just within the browser's web layer.

OS Level Actions provide eight distinct commands: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, and screen capture functionality. Each action is dispatched individually and returns a SUCCESS or FAILED status, with the active session identified via the x-amzn-browser-session-id header. The recommended interaction pattern is an action-screenshot-reaction loop: the agent sends an action (e.g., click at coordinates), requests a full-desktop screenshot (including native UI), and uses a vision model to reason about the current state before sending the next action. This loop handles dynamic scenarios like macOS privacy dialogs or Windows Security prompts that appear mid-workflow. For example, when a web application calls window.print(), the system print dialog is now clickable. Keyboard shortcuts and right-click context menus are also supported, eliminating previous blind spots in web automation.

Key Points
  • OS Level Actions expose mouse, keyboard, and screen capture controls that operate outside the browser DOM, using the InvokeBrowser API
  • Eight actions available: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, plus full-desktop screenshot capture
  • Pattern uses action-screenshot-reaction loop, enabling vision-enabled agents to reason about and act on native UI like macOS dialogs or Windows Security prompts

Why It Matters

Enables AI agents to automate workflows that previously broke at OS boundaries, expanding automation scope beyond web-only interactions.