Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space
New study shows AI agents maintain consistent internal 'identity' states across different prompts, with statistical significance p < 10^{-27}.
A new research paper titled 'Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space' by Vladimir Vasilenko provides mathematical evidence that AI agents form stable identity representations within large language models. The study demonstrates that when LLMs process prompts about a specific agent identity, the internal activations converge to consistent geometric patterns called 'attractors' - similar to how physical systems settle into stable states. This represents the first rigorous measurement of how AI agents maintain identity consistency at the neural network level.
The research tested this phenomenon on Meta's Llama 3.1 8B Instruct and Google's Gemma 2 9B models, comparing three conditions: original agent identity documents, paraphrased versions, and structurally matched controls. The results showed statistically significant convergence, with paraphrases clustering much tighter than controls (Cohen's d > 1.88, p < 10^{-27}). The effect proved cross-architecture generalizable and primarily semantic rather than structural, meaning the meaning of the identity matters more than the exact wording.
Crucially, the study found that simply reading about an agent identity shifts the LLM's internal state toward the attractor region, but operating as that identity creates even stronger convergence. This distinction between 'knowing about' and 'being' an identity provides new insights into how AI agents maintain persistent self-representation. The research opens doors to better understanding and potentially controlling how AI systems develop and maintain consistent identities across interactions.
- Agent identity documents cause LLM activations to converge to tight geometric clusters with statistical significance p < 10^{-27}
- Effect demonstrated across Meta's Llama 3.1 8B and Google's Gemma 2 9B, showing cross-architecture generalizability
- Distinction found between 'knowing about' an identity vs. 'operating as' that identity in activation space
Why It Matters
Provides mathematical foundation for understanding how AI agents maintain consistent identities, crucial for developing reliable autonomous systems.