All technical alignment plans are steps in the dark
Richard Juggins argues we can't test superintelligent AI safety solutions before deployment.
In a viral post on LessWrong titled 'All technical alignment plans are steps in the dark,' AI safety researcher Richard Juggins presents a fundamental critique of current approaches to aligning superintelligent AI. He argues the core challenge is the lack of empirical feedback: unlike normal engineering where you can test, fail, and iterate, we likely get only one shot with a superintelligent system. If it's powerful enough to take over the world and misaligned, there's no second chance. This creates a 'one-shot' problem where solutions must work perfectly on the first try without the iterative refinement that defines scientific progress.
Juggins analyzes the dominant industry approach, which involves empirically testing alignment techniques on the strongest available AI systems (like GPT-4 or Claude 3) and hoping the lessons generalize to superintelligence. He notes this plan has attracted criticism for assuming techniques that work on sub-human AI will scale. His central thesis is that this flaw is structural: *all* technical plans, whether theoretical or empirical, are forced to make untested leaps, or 'steps into the dark,' due to the missing feedback loop. The only viable path forward, he concludes, is to dramatically reduce the size of these steps by finding ways to safely iterate on the full alignment problem before creating a potentially catastrophic system.
- The 'one-shot' problem: Aligning superintelligent AI lacks the test-fail-iterate feedback loop of normal science and engineering.
- Critique of empirical safety: Techniques tested on today's AI (like GPT-4) may not generalize to superintelligent systems.
- Structural flaw: All alignment plans, theoretical or empirical, involve untested 'steps in the dark' due to missing critical feedback.
Why It Matters
Highlights a foundational, unsolved risk in the race to develop superintelligent AI that could shape safety research priorities.