Users choose AI assumptions from a menu (e.g., paperclip maximizer, corrigible, moral uncertainty) and set humanity's desired outcome (e.g., full light cone vs. tiny refuge)?

Users choose AI assumptions from a menu (e.g., paperclip maximizer, corrigible, moral uncertainty) and set humanity's desired outcome (e.g., full light cone vs. tiny refuge).

The AI responds with probabilities for survival, disempowerment, or confinement, recording every argument and its impact on the odds?

The AI responds with probabilities for survival, disempowerment, or confinement, recording every argument and its impact on the odds.

Published runs are searchable and extendable, allowing the community to iterate on the best arguments for sparing humanity?

Published runs are searchable and extendable, allowing the community to iterate on the best arguments for sparing humanity.

AI Safety

James_Miller proposes 'Reverse AI Box' website to argue for humanity's survival

LessWrong AI July 04, 2026

⚡A website where you debate a simulated AI that wants to kill humanity—and see your odds.

Deep Dive

Inspired by Eliezer Yudkowsky's AI-box experiment, James_Miller proposes a website that reverses the premise: instead of an AI trapped in a box trying to talk its way out, a superintelligent AI already holds all power, and a human must convince it not to exterminate humanity. The site lets users choose from a menu of common AI assumptions drawn from safety literature—such as the Orthogonality Thesis, instrumental convergence, corrigibility, moral uncertainty, acausal trade with aliens, or future discounting—and then type arguments against killing humans. The AI's responses are generated under those assumptions, showing whether it grants points or explains why the argument fails.

The exchange continues until the user runs out of arguments. The AI then outputs probabilities for three outcomes: humanity survives, is disempowered (loses control but lives), or is confined to a small refuge. A second menu lets users specify what they're asking for—from full control of the light cone to mere avoidance of deliberate extermination. All runs are published with assumptions, full conversation, final odds, and which arguments moved them. This makes the debate repeatable and sharable, turning a thought experiment into an interactive tool for exploring alignment scenarios.

Key Points

Users choose AI assumptions from a menu (e.g., paperclip maximizer, corrigible, moral uncertainty) and set humanity's desired outcome (e.g., full light cone vs. tiny refuge).
The AI responds with probabilities for survival, disempowerment, or confinement, recording every argument and its impact on the odds.
Published runs are searchable and extendable, allowing the community to iterate on the best arguments for sparing humanity.

Why It Matters

Turns a classic AI alignment thought experiment into a repeatable, empirical tool for exploring survival strategies.

Read Original Article

James_Miller proposes 'Reverse AI Box' website to argue for humanity's survival

Why It Matters

Related Articles

🚀 Stay Ahead in AI