AI Safety

Changes to an optimised thing make it worse

A thought experiment about fixing a watch reveals a critical principle for AI and complex systems.

Deep Dive

A thought experiment published on the rationality forum LessWrong has gone viral in tech circles, offering a crucial cautionary tale for anyone working with complex, optimized systems. Written by Sean Herrington, the post titled 'Changes to an optimised thing make it worse' uses a sci-fi parable: a traveler on an alien planet tries to fix a slightly slow watch. By replacing a gear and smoothing a disk, they inadvertently break the watch's ability to account for the planet's elliptical orbit and daily temperature swings. The story is a metaphor for the perils of intervening in systems—like a finely-tuned AI model or a robust codebase—without understanding their full, interconnected complexity.

The post formalizes this intuition with the concept of 'hill climbing' in optimization. When a system is at a peak of performance (the top of the mountain), any random step is statistically more likely to move it downhill. In high-dimensional spaces—like the parameter sets of modern LLMs (Large Language Models) such as Claude 3.5 or GPT-4o—this effect is magnified; a change intended to improve one metric often degrades several others in unseen ways. Herrington explicitly connects this principle to areas like AI alignment, governance, and 'world optimization,' warning that tweaks to stable, evolved systems require extreme care.

This framework provides a powerful lens for contemporary tech challenges. It explains why minor updates to stable software can cause major bugs, why fine-tuning a high-performing model like Llama 3 70B on narrow data can harm its broader capabilities, and why proposed fixes to complex social or economic systems often backfire. The post's viral spread underscores its resonance with professionals who routinely navigate these trade-offs.

Key Points
  • Uses a watchmaker parable to illustrate that tweaking optimized systems causes unforeseen damage.
  • Applies the 'hill climbing' optimization principle: any change away from a peak likely reduces performance.
  • Explicitly warns this applies to AI models, governance, and complex code—key concerns for tech builders.

Why It Matters

A vital mental model for developers fine-tuning AI, updating complex systems, or designing policy, preventing well-intentioned degradation.