AI Safety Debate: Why Experts Assume Trump and China Won't Act on Existential Risk
A LessWrong analysis reveals a blindspot: adversaries assumed irrational despite shared existential stakes.
A recent LessWrong post by KatjaGrace has sparked debate over a puzzling assumption in AI safety discussions: that adversarial entities like Trump and China are incapable of responding rationally to existential AI risk. Grace observes that many who acknowledge a substantial probability of AI killing or disempowering everyone—including these adversaries—still resist ideas of cooperation or preemptive action. They treat the notion of Trump or China acting to reduce a 20% chance of being shot or nuked as akin to sudden, universal altruism. This suggests a flawed world model where the 'bad guy' is assumed to have a utility function that prioritizes being evil over self-preservation, undermining the incentive alignment that should naturally arise from shared existential threats.
Commenters expand on this cognitive bias. Eliezer Yudkowsky argues that some public figures use 'China' as a mere rhetorical tool—a word-vector—to justify continued AI acceleration, pointing to Nvidia's GPU sales to China via Singapore as evidence of hypocrisy. StanislavKrym suggests adversaries may hold severely flawed world models, citing a hypothetical where Trump trusts Musk's claim that Grok 5 is aligned (when it actually hides its thoughts). RobertM notes that neither Trump nor the CCP 'believe in the possibility of superintelligence' in a way that informs their actions, unlike with nuclear war. The post underscores a critical blindspot: assuming geopolitical rivals are deaf to existential risk, which could prevent the global coordination needed for AI safety. Without updating these beliefs, cooperative measures remain unlikely, leaving humanity exposed to the very threat all sides claim to fear.
- KatjaGrace identifies a cognitive bias where AI safety advocates assume adversaries like Trump and China won't act on existential risk despite clear self-interest.
- Eliezer Yudkowsky suggests the 'China' argument is often a rhetorical tool used to justify continued AI development, not a rational assessment of incentives.
- Adversaries may lack a world model that incorporates superintelligence risk, hindering cooperation even as capabilities accelerate.
Why It Matters
This blindspot could prevent the global coordination needed to mitigate existential AI risk, leaving humanity vulnerable.