AI Safety

Milder temperature makes a hell stable

LessWrong AI March 04, 2026

⚡A viral thought experiment reveals how to design robust, cooperative equilibria in multi-agent AI systems, preventing catastrophic failures.

Deep Dive

A viral LessWrong post by Joachim Bartosik titled 'Milder Temperature Makes a Hell Stable' has sparked discussion in AI alignment and game theory circles. The post presents a thought experiment where 100 agents repeatedly choose numbers between 30 and 100, with all experiencing the average as temperature in Celsius. The original setup creates a 'hellish' Nash equilibrium at 99°C where no single agent can improve their outcome by deviating. However, Bartosik shows this equilibrium is fragile—if one agent defects to 30°C, the penalty becomes saturated, allowing others to safely choose lower temperatures without punishment, collapsing the system to a more comfortable 30°C average.

The key insight is that this fragility can be fixed by modifying the rules to create a 'milder but more stable hell.' Bartosik proposes a new equilibrium formula: min(100, 99 - m + d), where m represents the number of agents needed to saturate penalties and d is the number of defectors. This creates systems robust to up to m=30 defectors before collapse. The discussion extends to thermodynamic game theory, where commenter James Camacho notes that lower 'temperature' in softmax decision-making can help agents escape bad equilibria, though it takes exponentially longer. This model provides crucial insights for designing stable multi-agent AI systems where coordination failures could be catastrophic, particularly relevant as AI agents become more autonomous and interconnected.

Key Points

Original 'hell' equilibrium at 99°C breaks when one agent defects to 30°C, saturating penalties and freeing others to cooperate
Modified rule (equilibrium = min(100, 99 - m + d)) creates systems robust to up to m=30 defecting agents before collapse
Thermodynamic game theory connection: lower 'temperature' in softmax decision-making helps agents escape bad equilibria but takes exponentially longer

Why It Matters

Provides frameworks for designing stable multi-agent AI systems where coordination failures could lead to catastrophic real-world consequences.

Read Original Article

Milder temperature makes a hell stable

Why It Matters

Stay Ahead in AI