AI Safety

Schelling Goodness, and Shared Morality as a Goal

New framework uses game theory to find moral consensus among diverse AI agents without shared history.

Deep Dive

Multiplicity.ai has published a conceptual framework called 'Schelling goodness' that applies game theory to the challenge of AI alignment. The core idea adapts Thomas Schelling's coordination games—where participants try to predict each other's choices to reach mutual benefit—to moral reasoning. In this framework, hypothetical agents from completely different, successful civilizations attempt to converge on binary answers (good/bad) to moral questions. They have no shared history or cultural context, only common knowledge of the question and background pressures from civilizational survival. The goal is not to declare objective morality, but to predict what such a diverse group would agree upon when explicitly coordinating.

The technical approach defines 'Schelling-good' as the likely consensus answer in a forced-choice scenario where agents aim for mutual agreement. This creates a potential reference point for aligning AI systems that might surpass human intelligence and values. The essay carefully distinguishes these coordination-game predictions from first-order moral claims, labeling speculative sections clearly. For AI safety researchers, this offers a formal method to explore stable moral equilibria that diverse intelligences might independently reach. The next steps involve testing this framework with AI models to see if they converge on similar answers when placed in these hypothetical coordination scenarios.

Key Points
  • Applies Thomas Schelling's coordination game theory to AI moral alignment problems
  • Seeks consensus among agents with no shared history beyond civilizational success pressures
  • Defines 'Schelling-good' as predicted agreement, not an objective moral claim

Why It Matters

Provides a game-theoretic framework for aligning superintelligent AI systems through predicted moral consensus.