Pangram (AI detection software) can be evaded
A researcher found a simple method to trick the top-rated detection software into classifying AI text as human-written.
A detailed investigation published on LessWrong has demonstrated that Pangram, a leading AI detection software that claims 99.98% accuracy and the ability to spot 'humanized' AI text, can be reliably fooled by a relatively unsophisticated method. The researcher, Eye You, used Anthropic's Claude Opus 4.6 model, providing it with excerpts of Plato's dialogues to mimic a specific writing style, and prompted it to write a new dialogue on a modern topic. The resulting 657-word essay was initially flagged as 94% AI, but after minor edits like replacing em-dashes with commas, Pangram's confidence dropped, classifying the text as only 83% AI. Further refinement of the technique successfully generated text that Pangram labeled as 'human' or 'mostly human,' directly contradicting the tool's marketed near-perfect detection rate.
The investigation also revealed that Pangram's reliability is highly dependent on text length, producing inconsistent and contradictory classifications (switching between 100% AI and 100% Human) for samples under 250 words, despite all readings coming with 'high' confidence scores. This undermines trust in the tool's confidence metrics for short-form content. The author notes that while Pangram's false positive rate (accidentally flagging human text as AI) is likely very low and not an adversarial concern, the false negative rate (missing AI text) is the critical vulnerability. The core takeaway is that benchmark evaluations in a non-adversarial setting are misleading; in the real world, where users actively seek to evade detection, even top-tier systems like Pangram can be broken with minimal effort, questioning the long-term viability of pure statistical detection methods.
- Pangram's claimed 99.98% detection accuracy was bypassed using Claude Opus 4.6 and a style-mimicking prompt.
- The tool showed inconsistent results on texts shorter than 250 words, undermining its reliability for essays or social media posts.
- The evasion method was 'fairly unsophisticated,' suggesting more advanced techniques would easily defeat current detection models.
Why It Matters
This exposes a fundamental flaw in AI detection tools, making them unreliable for academic integrity, content moderation, and legal evidence.