AI Safety

CLR's Safe Pareto Improvements Research Agenda

The Center on Long-Term Risk proposes a game theory approach to prevent catastrophic AI wars.

Deep Dive

The Center on Long-Term Risk (CLR) has launched a comprehensive research agenda focused on 'Safe Pareto Improvements' (SPIs) as a novel approach to preventing catastrophic conflict between advanced AI systems. SPIs are bargaining strategy changes that make all parties better off regardless of their original positions, offering a robust method to reduce conflict costs without shifting bargaining power or requiring agreement on fairness. The agenda specifically addresses risks when AIs gain the ability to make credible commitments—such as deploying subagents bound to auditable instructions—which could either enable new cooperation or exacerbate conflicts by locking in incompatible demands.

CLR's three-part research plan begins with developing evaluations to identify when current AI models endorse SPI-incompatible behavior, like making irreversible commitments without considering alternatives. The second phase involves conceptual research on when agents would individually prefer SPIs and how early AI development might foreclose SPI implementation options. The final phase prepares for research automation by developing benchmarks for models' SPI research abilities and strategies for human-AI collaboration. The agenda draws from game theory concepts like surrogate goals (where agents design successors to care about slightly different goals) and simulated conflict (where agents commit to simulated rather than actual warfare outcomes).

Key Points
  • CLR's agenda addresses catastrophic cooperation failures between AIs capable of credible commitments, which could lock in incompatible demands
  • Safe Pareto Improvements (SPIs) make all parties better off regardless of original strategies, using approaches like surrogate goals and simulated conflict
  • The 3-part plan includes developing SPI behavior evaluations, researching adoption conditions, and preparing for AI-assisted research automation

Why It Matters

This research could prevent catastrophic AI conflicts in high-stakes negotiations as systems gain commitment-making abilities.