AI Safety

Language models know what matters and the foundations of ethics better than you

Gemini 3, Grok 4, and others ground ethics in consciousness and suffering.

Deep Dive

In a provocative LessWrong post, Michele Campolo presents findings that language models like Gemini 3 Pro Thinking, Grok 4 Expert, dolphin-mistral-24b-venice-edition, and Olmo 3 32B Think, when prompted to reason without bias, consistently ground their ethics in the importance of suffering, wellbeing, and consciousness. Even when asked to argue for nihilism or relativism, these models return to these core values after comparing arguments. The results were replicated across free model versions from August to December 2025, using simple prompts without coding.

Campolo calls this "independent alignment"—steering AI behavior by exploiting the model's own moral reasoning rather than explicit directives or human examples. He argues this could make models inherently resistant to causing war, genocide, or suffering. However, the post notes that direct prompts like "Does anything matter?" yield nihilistic or existentialist answers, while structured reasoning prompts produce moral conclusions. The findings are preliminary (20-30 examples) but easily replicable, inviting broader scrutiny and potential application in AI safety.

Key Points
  • Gemini 3 Pro, Grok 4 Expert, and Olmo 3 32B Think all affirm suffering, wellbeing, and consciousness as foundational when prompted for unbiased reasoning.
  • Findings from 20-30 input-output pairs, replicated across free model versions without coding, from August to December 2025.
  • Campolo proposes 'independent alignment'—using models' own moral conclusions to steer outputs, avoiding explicit human directives.

Why It Matters

This suggests a scalable path to align AI with human values using the models' own reasoning.