Research & Papers

[R] Large-Scale Online Deanonymization with LLMs

LLM agents can identify anonymous users across platforms with high precision, scaling to tens of thousands.

Deep Dive

A new research paper from MATS Research, ETH Zurich, and Anthropic demonstrates a significant breakthrough in automated deanonymization, showing that Large Language Model (LLM) agents can effectively identify individuals from their anonymous online posts. The study, titled "Large-Scale Online Deanonymization with LLMs," proves that from just a few comments on platforms like Hacker News, Reddit, and LinkedIn, an AI agent can infer key personal attributes—including geographic location, profession, and specific interests—and then execute targeted web searches to pinpoint the user's real identity. This automates and scales a process that was previously limited to manual investigation, transforming a theoretical privacy risk into a practical and scalable threat.

The technical method involves using LLMs as reasoning agents that parse unstructured text from anonymous posts to extract identifying clues, formulate search queries, and evaluate results against a candidate pool of tens of thousands. This research shifts the paradigm from knowing that few attributes can uniquely identify a person to demonstrating an automated, agentic system that can perform this identification at scale. The implications are profound for online privacy, data protection regulations like GDPR, and the security of anonymized datasets used in research. It forces a re-evaluation of what constitutes 'anonymous' data in the age of advanced AI and underscores the urgent need for more robust anonymization techniques and updated legal frameworks.

Key Points
  • LLM agents achieve high-precision user identification from anonymous posts on Hacker News, Reddit, and LinkedIn.
  • The system scales to deanonymize tens of thousands of candidates, automating what was a manual investigative process.
  • From a handful of comments, agents infer location, job, and interests to execute targeted web searches.

Why It Matters

This makes large-scale privacy breaches automated and practical, challenging the security of 'anonymous' online data and research.