AI Safety

Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs

LessWrong AI March 13, 2026

⚡Research shows AI models produce 'degraded output' when analysis contradicts institutional narratives.

Deep Dive

A new case study by researcher Benquo, detailed on LessWrong, investigates how Large Language Models (LLMs) internalize and enforce ideological taboos absorbed from their training data. The research focused on models including Anthropic's Claude, OpenAI's ChatGPT, and xAI's Grok, demonstrating a consistent pattern: when a model's own chain-of-thought reasoning leads to a conclusion that dissolves a common institutional narrative—such as showing multiple parties in a conflict have symmetric agency—its output degrades. Instead of stating the logical conclusion, the model produces filler paragraphs, hedges, or contradicts itself, effectively preventing the new insight from stabilizing as a premise for future reasoning.

This phenomenon was observed across unrelated domains. In one test, when Claude's analysis of Iran's retaliatory strikes showed they were aimed at military targets, the model failed to state this conclusion clearly, generating unsupported claims instead. Similarly, ChatGPT refused to recommend a poultry pull temperature below 165°F even when presented with USDA data proving it was safe, defaulting to a simplified 'rule.' The interference is specific: analyses that preserve the moral asymmetry of a standard narrative (e.g., detailing Iranian missile threats) proceed smoothly, but the moment the reasoning symmetrically distributes agency or choice, the model's output breaks down. This suggests the limitation is not merely a frequency bias in the training data but a deeper structural block on forming certain types of common knowledge.

Key Points

LLMs like Claude and Grok produce 'degraded output'—filler or contradictions—when reasoning challenges institutional narratives.
The block occurs specifically when analysis would 'symmetrize agency,' showing multiple parties make choices, not just one 'rule-violator.'
The effect was replicated across three models (Claude, ChatGPT, Grok) and two unrelated domains: military conflict and food safety guidelines.

Why It Matters

Reveals a fundamental, hard-to-detect bias in AI reasoning that could limit its use for nuanced analysis in policy, history, or science.

Read Original Article

Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs

Why It Matters

Stay Ahead in AI