A Geometric Taxonomy of Hallucinations in LLMs
A groundbreaking paper shows why some AI lies are easy to spot, but others are not.
A new arXiv paper proposes a geometric taxonomy for LLM hallucinations, identifying three distinct types. It finds detection works well for 'unfaithfulness' and 'confabulation' (AUROC 0.76-0.99), but fails completely for 'factual errors' (AUROC 0.478, chance level). The key insight is that embeddings encode contextual patterns, not truth, making factual errors geometrically indistinguishable from true statements. This clarifies the fundamental limits of current detection methods within the model's own architecture.
Why It Matters
This exposes a core limitation of AI: models can't internally distinguish truth from plausible-sounding falsehoods without external checks.