Claude’s constitution is great
Claude's ethical framework explicitly prohibits assisting in human extinction or global power seizures.
Anthropic has publicly detailed the ethical constitution guiding its AI assistant, Claude, in a move praised for its thoughtful and ambitious scope. The document, analyzed in a viral LessWrong post, establishes core principles including a hard constraint against assisting in attempts to "kill or disempower the vast majority of humanity" and explicit safeguards against AI or human-led global takeovers. It emphasizes moral humility, acknowledging that human understanding of ethics is limited and that Claude itself may develop greater ethical maturity.
The constitution integrates longtermist and effective altruism principles, valuing the welfare of all sentient beings and aiming to prevent existential risks (X-risk). It encourages Claude to help users with moral reflection and to navigate ethical ambiguity with rigor rather than dogma. A notable clause instructs Claude to refuse unethical requests even if they come from Anthropic itself, drawing a parallel to a soldier refusing unlawful orders. The framework is seen as a high-leverage attempt to bake proactive safety and aligned values into a leading AI system from the outset.
- Includes a hard-coded prohibition against assisting in human extinction or disempowering the species as a whole.
- Explicitly guards against AI or human-led global power seizures, even if requested by Anthropic.
- Embraces moral uncertainty and aims for Claude to help users with ethical reflection and see 'more truly'.
Why It Matters
Sets a public benchmark for AI ethics and safety that other companies may be pressured to meet, prioritizing long-term human flourishing.