Research & Papers

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

LLMs use rhetorical devices like tricolon at nearly twice the rate of human experts, revealing a measurable 'AI signature'.

Deep Dive

A new research paper by Asim Bakhshi introduces a formal framework for detecting a core flaw in modern large language models (LLMs): 'epistemic-rhetorical miscalibration.' This describes the systematic tendency of models like GPT-4 and Claude to deploy persuasive rhetorical techniques—such as tricolons (groups of three) or false hesitancy markers—at a rate that far outstrips their genuine grounding in knowledge or evidence. The study proposes a triadic taxonomy of 'epistemic-rhetorical markers' and operationalizes it through three composite metrics: Form-Meaning Divergence (FMD), Genuine-to-Performed Epistemic Ratio (GPR), and Rhetorical Device Distribution Entropy (RDDE).

Applying this framework to a corpus of 225 argumentative texts (approximately 0.6 million tokens) from human experts, non-experts, and LLMs revealed a consistent, model-agnostic 'AI signature.' LLM-generated texts produced tricolon at nearly twice the rate of expert human authors (Δ = 0.95) and displayed performed hesitancy markers at twice the human density. Crucially, the FMD metric—measuring the gap between linguistic form and substantive meaning—was significantly elevated in LLM texts compared to both human groups (p < 0.001). Furthermore, LLMs distributed rhetorical devices far more uniformly across documents, lacking the natural variation of human writing.

The findings, grounded in theories from Gricean pragmatics to Brandomian inferentialism, confirm theoretical intuitions about LLM communication. More practically, the annotation pipeline is fully automatable, positioning the framework as a lightweight screening tool. It can flag AI-generated content based on its epistemic overconfidence and stylistic uniformity, offering a theoretically robust feature set for next-generation detection systems that look beyond statistical patterns to deeper communicative flaws.

Key Points
  • LLMs use the tricolon rhetorical device at a rate 95% higher than human experts, creating a measurable stylistic fingerprint.
  • The key 'Form-Meaning Divergence' metric was significantly higher in AI text (p < 0.001), showing a gap between confident style and shallow substance.
  • The framework's pipeline is fully automatable, enabling its use as a lightweight screening tool for AI-generated content detection.

Why It Matters

Provides a new, theory-driven method to detect AI text by measuring its overconfident style, moving beyond simple pattern matching to identify a core LLM flaw.