AI Safety

Auto-review of agent actions without synchronous human oversight

LessWrong AI May 04, 2026

⚡OpenAI replaces manual approvals with an auto-reviewer, catching dangerous actions at scale.

Deep Dive

OpenAI has introduced Auto-review in Codex, a new mode that bridges the gap between requiring constant human approval and granting full unrestricted access. Until now, users faced a trade-off: Default mode demanded frequent human sign-offs for every action at the sandbox boundary, while Full Access mode removed friction but also eliminated oversight. Auto-review offers a third path by replacing human approval with a separate AI agent that reviews actions.

Internally, Auto-review increases confidence in running long agentic tasks without synchronous human oversight. In practice, Codex sessions stop for human approval roughly 200 times less often than in manual approval mode, yet the system still catches many of the actions humans would want stopped. Most actions run without approval in the sandbox. For the small fraction that need review, Auto-review approves around 99% of them. Even when Auto-review rejects an action, Codex often recovers on its own by finding a safer way to make progress. The feature performed well on dangerous actions evaluations, giving OpenAI enough confidence to deploy it internally—conservative enough for security teams, unobtrusive enough for research and deployment workflows.

Key Points

Reduces human approval stops by ~200x compared to manual approval mode.
Catches dangerous actions; 99% of reviewed actions are approved by the AI reviewer.
Codex can autonomously recover from rejected actions by finding safer alternatives.

Why It Matters

Enables long-running autonomous AI agents to operate safely without constant human oversight.

Read Original Article

Auto-review of agent actions without synchronous human oversight

Why It Matters

Stay Ahead in AI