AI Safety

Why external AI safety experts should prioritize model specs over techniques

No ML knowledge needed—just a natural language document to shape AI behavior.

Deep Dive

A new argument urges the external AI safety community to focus on model specs/constitutions over the next 12 months. Citing Anthropic's Claude constitution—which includes external contributors—the post highlights that drafting amendments requires no ML or engineering expertise. Outsiders can send draft passages to insiders, leveraging domain expertise on threat models. Integration is as simple as copy-pasting into a markdown file. The approach is tractable, neglected, and avoids costly codebase transfers, making it ideal for macrostrategy and conceptual reasoning work.

Key Points
  • Outsiders can contribute to AI safety via natural language amendments to model specs, requiring zero ML knowledge.
  • The Claude constitution precedent lists 18 external commenters, showing integration is straightforward (copy-paste).
  • Recommendation: focus on neglected threat models and avoid writing 'Claude' in drafts to prevent constitutional poisoning.

Why It Matters

Enables diverse non-technical experts to directly shape AI behavior, democratizing safety oversight beyond lab employees.