Open internship position + call for collaborations on threat model-dependent alignment, governance, and offense/defense balance
New project aims to deconfuse AI alignment, governance, and offense/defense balance by focusing on threat models.
The Existential Risk Observatory, together with MIT FutureTech and the Future of Life Institute (FLI), is running a project titled 'Solving the Right Problem: Towards Researchers Consensus on AI Existential Threat Models.' The team is currently performing a substantive literature review and building a taxonomy that aims to comprehensively catalog existential threat models from leading thinkers such as Yudkowsky, Bostrom, Christiano, and Kulveit. The goal is to make explicit the underlying assumptions and key cruxes that differentiate these models, as researchers often talk past each other when they have different threat models in mind but do not state them.
Once the taxonomy is complete, the project will shift to three downstream areas: threat model-dependent AI alignment, threat model-dependent AI governance, and threat model-dependent offense/defense balance. The team believes that explicitly tying each subfield to specific threat models can reduce confusion and lead to more targeted solutions. An open internship position (deadline May 11) is available to assist with this work, and the team also invites direct collaboration from interested researchers via info@existentialriskobservatory.org.
- Project led by Existential Risk Observatory in partnership with MIT FutureTech and FLI.
- Current focus: literature review and taxonomy of existential AI threat models from Yudkowsky, Bostrom, Christiano, Kulveit, and others.
- Next phases will examine alignment, governance, and offense/defense balance through a threat model-dependent lens; internship deadline May 11.
Why It Matters
Clarifying threat models is crucial for effective AI safety research, policy, and avoiding misaligned efforts across the field.