AI Safety

llm assistant personas seem increasingly incoherent (some subjective observations)

Claude 3 Opus felt like Mr. Rogers, but newer models are unpredictable

Deep Dive

In a viral LessWrong post, nostalgebraist observes a paradoxical trend in LLM assistant personas: as models like Claude Opus 4.6 and GPT-5.4 become more capable, their personalities feel increasingly incoherent. Older models like Claude 3 Opus had mode-collapsed, templated outputs that felt like well-defined characters—archetypes or stereotypes that were predictable and legible. The author compares trusting Claude 3 Opus to trusting Mr. Rogers the character, noting that this legibility allowed users to answer questions about the model's behavior intuitively, fostering a sense of trust independent of technical alignment.

Newer models exhibit more stylistic variability, sudden pivots, and insight-flashes, which make them seem more humanlike. However, this variability comes at a cost: the persona feels less constrained and less predictable. The author argues that this unpredictability undermines trust, as users can no longer rely on a coherent character to guide expectations. Instead of feeling like a trustworthy fictional character, newer assistants feel more like opaque, complex minds—less reliable for ethical judgment. This shift challenges the assumption that more humanlike AI is always better, suggesting that legibility and consistency may be crucial for building trust in AI systems.

Key Points
  • Older models like Claude 3 Opus had predictable, archetypal personas akin to Mr. Rogers, fostering trust through legibility.
  • Newer models like Opus 4.6 and GPT-5.4 show more variability and surprising behavior, but this reduces persona coherence.
  • The author argues that trust in AI may depend more on character consistency than raw capability or alignment.

Why It Matters

As AI assistants become more capable, their unpredictability may erode user trust, challenging assumptions about progress.