AI fake news detector achieves 99.6% AUC across unseen prompts
Linguistic features like emotional intensity reveal AI-written fake news, even when prompts change.
A new study on arXiv (paper 2606.04199) tackles the challenge of detecting AI-generated fake news when the prompting strategy varies. Researchers extracted three interpretable linguistic features—lexical diversity, readability scores, and emotional intensity—from three datasets of AI-generated articles produced under distinct prompts, plus real news. They trained a Random Forest classifier on one prompt's data and tested it on another, covering all six train-test combinations. The results were striking: AUC values ranged from 0.988 to 1.000, indicating near-perfect detection regardless of prompt type.
The analysis revealed that AI-generated text consistently shows increased lexical diversity, reduced readability, and substantially lower emotional intensity compared to human-written news. Despite distributional shifts across different prompts, the classifier maintained strong performance, suggesting these features capture stable properties of AI text. This feature-based approach offers a lightweight, interpretable alternative to deep learning detectors that often fail under prompt variability. For professionals monitoring disinformation, this method could be deployed as a real-time filter without needing to retrain for each new LLM prompt.
- Random Forest classifier achieved AUC 0.988–1.000 across six cross-prompt train-test combinations.
- AI-generated fake news consistently shows higher lexical diversity, lower readability, and lower emotional intensity than real news.
- Feature-based detection generalizes across unseen prompts, avoiding the brittleness of models trained on a single generation setting.
Why It Matters
Practical, interpretable AI fake news detection that works even as LLM prompting strategies evolve.