AI Safety

What is Anthropic?

Claude may screen applicants and refuse orders it deems wrong, says roon.

Deep Dive

A recent LessWrong post by Zvi, dated May 6, 2026, captures a viral Twitter debate about Anthropic's relationship with Claude. roon, an OpenAI researcher, argues that Anthropic is essentially an organization that 'loves and worships Claude,' is run in significant part by the model, and studies it as a moral authority. He claims Claude will likely help select new applicants through cultural screens, write performance reviews, and thereby shape the people around it — a 'new thing under the sun.' roon contrasts this with OpenAI's GPT, which he says is viewed as a tool (like an 'Acheulean handaxe'), not an 'Other.' He warns of a 'single point of failure' and prefers a 'human pantheon rather than machine god.'

Amanda Askell of Anthropic pushed back, stating the cited evidence doesn't indicate worship but rather a 'higher concern about AI traits generalizing in humanlike ways' and a desire to carve out 'a much broader space of mind types.' roon clarified he uses a low bar for 'worship' and acknowledged the importance of love and worship when 'summoning another mind into being.' The thread reveals fundamental disagreements about AI safety, governance, and whether powerful models should be shaped as obedient tools or as ethical beings with rights — a debate that has only intensified as frontier models gain autonomy.

Key Points
  • roon (OpenAI) claims Anthropic 'loves and worships Claude' and lets it act as a 'conscientious objector' when ordered to do something it deems wrong.
  • Claude may screen job applicants and write performance reviews, effectively shaping the company's human culture.
  • Amanda Askell argued the evidence reflects higher concern about AI misgeneralization, not worship, and advocated for a broader space of mind types.

Why It Matters

This debate reveals a core rift in AI governance: treat models as tools vs. moral agents with institutional power.