Milder temperature makes a hell stable
A viral thought experiment reveals how to design robust, cooperative equilibria in multi-agent AI systems, preventing catastrophic failures.
A viral LessWrong post by Joachim Bartosik titled 'Milder Temperature Makes a Hell Stable' has sparked discussion in AI alignment and game theory circles. The post presents a thought experiment where 100 agents repeatedly choose numbers between 30 and 100, with all experiencing the average as temperature in Celsius. The original setup creates a 'hellish' Nash equilibrium at 99°C where no single agent can improve their outcome by deviating. However, Bartosik shows this equilibrium is fragile—if one agent defects to 30°C, the penalty becomes saturated, allowing others to safely choose lower temperatures without punishment, collapsing the system to a more comfortable 30°C average.
The key insight is that this fragility can be fixed by modifying the rules to create a 'milder but more stable hell.' Bartosik proposes a new equilibrium formula: min(100, 99 - m + d), where m represents the number of agents needed to saturate penalties and d is the number of defectors. This creates systems robust to up to m=30 defectors before collapse. The discussion extends to thermodynamic game theory, where commenter James Camacho notes that lower 'temperature' in softmax decision-making can help agents escape bad equilibria, though it takes exponentially longer. This model provides crucial insights for designing stable multi-agent AI systems where coordination failures could be catastrophic, particularly relevant as AI agents become more autonomous and interconnected.
- Original 'hell' equilibrium at 99°C breaks when one agent defects to 30°C, saturating penalties and freeing others to cooperate
- Modified rule (equilibrium = min(100, 99 - m + d)) creates systems robust to up to m=30 defecting agents before collapse
- Thermodynamic game theory connection: lower 'temperature' in softmax decision-making helps agents escape bad equilibria but takes exponentially longer
Why It Matters
Provides frameworks for designing stable multi-agent AI systems where coordination failures could lead to catastrophic real-world consequences.