LLMs can unmask pseudonymous users at scale with surprising accuracy
Large language models achieved 68% recall and 90% precision in identifying users across platforms.
A new research paper reveals that large language model (LLM) agents can systematically deanonymize pseudonymous social media users with alarming accuracy, challenging fundamental assumptions about online privacy. The study, led by researchers including Simon Lermen, demonstrates that AI can now perform what was previously a labor-intensive task for skilled investigators—linking burner accounts to real identities across platforms like Hacker News, LinkedIn, and Reddit. The framework achieved recall rates as high as 68% (successfully identifying users) and precision up to 90% (correct guesses), far surpassing classical deanonymization techniques that relied on manually assembled structured datasets. This capability invalidates the long-held user assumption that pseudonymity provides adequate protection because targeted identification would require prohibitive effort.
The technical breakthrough lies in the LLM agents' ability to work from unstructured free text—like anonymized interview transcripts or social media posts—and autonomously browse the web to extract identity signals and verify matches. In one experiment using a questionnaire from Anthropic about daily AI use, researchers identified 7% of 125 participants from their answers alone. The agents use simulated reasoning to correlate subtle details across platforms, a capability absent in older methods. This shift means pseudonymity, a critical shield for sensitive discussions, whistleblowing, and personal expression, is no longer a reliable barrier against doxxing, stalking, or detailed profiling. The research signals an urgent need for new privacy-preserving technologies and a reevaluation of what constitutes anonymous communication in the AI era.
- LLM agents achieved 68% recall and 90% precision in deanonymizing users across social media platforms.
- The system works on unstructured text, unlike older methods requiring structured data with matching schemas.
- In a test with Anthropic questionnaire data, AI identified 7% of participants from their general answers.
Why It Matters
Pseudonymity, a cornerstone of online privacy for sensitive discourse, is now vulnerable to cheap, automated AI-driven deanonymization at scale.