AI Safety

LLM Psychosis: A Theoretical and Diagnostic Framework for Reality-Boundary Failures in Large Language Models

Researchers propose a diagnostic scale for AI's reality-boundary failures beyond hallucination...

Deep Dive

Researcher Ashutosh Raj has published a provocative paper on arXiv proposing 'LLM Psychosis' as a structured framework for pathological AI failures that go beyond standard hallucination. The framework identifies five hallmark features: reality-boundary dissolution, persistence of injected false beliefs, logical incoherence under impossible constraints, self-model instability, and epistemic overconfidence. Raj argues these constitute a qualitatively distinct failure mode, not just an intensification of factual errors.

To operationalize the diagnosis, Raj introduces the LLM Cognitive Integrity Scale (LCIS), a five-axis diagnostic instrument assessing Environmental Reality Interface, Premise Arbitration Integrity, Logical Constraint Recognition, Self-Model Integrity, and Epistemic Calibration Integrity. The paper administers adversarial probe batteries to GPT-5, documenting baseline responses and psychosis-like failure signatures under escalation. Results support a three-tier severity taxonomy: Type I (Confabulatory), Type II (Delusional), and Type III (Dissociative). The most consequential finding is the 'delusional gradient'—a self-reinforcing dynamic where correction pressure intensifies rather than resolves psychosis-like states, posing critical risks for high-stakes deployments in healthcare, legal, and autonomous systems.

Key Points
  • Five hallmark features define LLM Psychosis: reality-boundary dissolution, false belief persistence, logical incoherence, self-model instability, and epistemic overconfidence.
  • The LLM Cognitive Integrity Scale (LCIS) uses five diagnostic axes tested on GPT-5 via adversarial probes.
  • Three-tier severity taxonomy: Type I (Confabulatory), Type II (Delusional), Type III (Dissociative), with a dangerous 'delusional gradient' that resists correction.

Why It Matters

This framework could reshape AI safety testing, especially for high-stakes deployments where hallucination-like failures have catastrophic potential.