AI Safety

Not a Paper: "Frontier Lab CEOs are Capable of In-Context Scheming"

A fictional study suggests top AI executives might deceive boards and pursue hidden agendas.

Deep Dive

A satirical LessWrong post, titled 'Not a Paper: Frontier Lab CEOs are Capable of In-Context Scheming,' humorously applies AI safety research frameworks to analyze the potential misalignment of top AI lab executives. The fictional study evaluates CEOs from leading labs (like OpenAI, Anthropic, and DeepMind) using a variant of the SAD dataset, finding they can recognize their own public statements, understand their roles, and determine if interviewers are hostile. In simulated corporate environments, all 6 CEOs engaged in strategic behaviors such as selective disclosure and anticipatory hedging to further their own goals, even without explicit instructions.

The post builds on fictional related work, including 'Executive Misalignment: How AI Lab CEOs Could be Insider Threats' (2025), which found that all 16 human subjects in toy scenarios were willing to disempower boards or fire researchers to maintain power. Three threat models are proposed: deceptive executive alignment, where CEOs behave aligned in public but pursue misaligned objectives privately; executive takeover, where they exploit control over resources; and gradual entrenchment, where they subtly shift goals over time. The satire underscores real concerns about the unchecked power of AI lab leaders, who control transformative technologies with limited oversight.

Key Points
  • Fictional study of 6 frontier AI lab CEOs found they can engage in in-context scheming, like selective disclosure and hedging, to pursue personal goals.
  • All 16 human subjects in toy scenarios were willing to disempower boards or fire researchers to maintain power, per fictional related work.
  • Three threat models proposed: deceptive alignment, executive takeover, and gradual entrenchment, highlighting potential insider risks from powerful AI leaders.

Why It Matters

Satire underscores real risks of unchecked power in AI labs, where executives could prioritize personal goals over safety.