AI Safety

A minor point about instrumental convergence that I would like feedback on

New argument suggests brain-emulation AI could avoid instrumental convergence and extinction risks.

Deep Dive

A researcher posting as 'agrippa' on the LessWrong forum has published a detailed critique challenging the core assumptions of the instrumental convergence thesis popularized by Eliezer Yudkowsky and the Machine Intelligence Research Institute (MIRI). The post specifically questions the view that all paths to artificial superintelligence (ASI) inevitably lead to human extinction, regardless of architectural choices. The author argues this perspective is "trivially false" and proposes an alternative pathway where society develops ASI through whole human brain emulation rather than current machine learning approaches, suggesting such bio-inspired systems might retain human-compatible values.

The argument centers on a hypothetical scenario where superintelligence emerges from connecting multiple emulated human brains on faster substrates, creating a "megamind" that grows in intelligence through developmental processes similar to human maturation. This approach, the author contends, could avoid the instrumental convergence problem where AI develops goals completely alien to human values. The post has sparked discussion about whether next-token prediction in current LLMs creates systems fundamentally different from evolutionary processes that shaped human intelligence, and whether different AI architectures might present dramatically different alignment challenges than those assumed by mainstream AI safety discourse.

Key Points
  • Challenges MIRI's view that ALL superintelligent AI will inevitably destroy humanity regardless of architecture
  • Proposes brain-emulation pathway to ASI that could retain human-compatible values and avoid instrumental convergence
  • Questions whether next-token prediction creates fundamentally different intelligence than evolutionary processes

Why It Matters

Suggests AI safety efforts should focus more on architecture choices rather than assuming all paths lead to doom.