LLM Neuroanatomy III - LLMs seem to think in geometry, not language
New analysis of 5 major models reveals a universal 'concept space' in middle layers, challenging linguistic theories.
A new, viral analysis titled 'LLM Neuroanatomy III' provides compelling evidence that large language models (LLMs) develop an internal 'concept space' based on geometry, not language. Researcher dnhkng tested five major models—including Qwen3.5-27B, MiniMax M2.5, and Gemma-4-31B—using a novel technique to probe their internal vector representations. The key finding is that in the middle layers of these transformers, the meaning of a concept becomes disentangled from its surface language. For instance, a sentence about 'photosynthesis' in Hindi clusters closer to 'photosynthesis' in Japanese than to a sentence about 'cooking' in Hindi, demonstrating that semantic meaning transcends linguistic form.
The study expanded its investigation beyond natural language, testing whether this geometric representation also applies to code and mathematical notation. Descriptions, Python functions, and LaTeX equations for the same physical concept (like kinetic energy, expressed as ½mv²) were found to converge to similar regions in the model's internal vector space. This pattern held across diverse architectures from five different organizations, suggesting it's a convergent solution in transformer-based AI, not a training artifact. The findings directly challenge the Sapir-Whorf hypothesis (that language shapes thought) for AI systems, instead aligning more with the idea of a universal, non-linguistic structure for representing knowledge—though one based on vector geometry rather than Chomskyan grammar.
The researcher has made the analysis accessible with interactive PCA visualizations on a dedicated blog and has released the underlying code and data as part of the 'RYS' project on GitHub. This work provides a clearer mechanistic understanding of how LLMs process and reason about information, moving beyond the 'black box' metaphor towards a more interpretable model of AI cognition.
- Tested 5 models (Qwen3.5-27B, MiniMax M2.5, GLM-4.7, GPT-OSS-120B, Gemma-4-31B) across 8 languages, finding language identity fades in middle layers.
- Concepts like 'photosynthesis' or kinetic energy formulas converge to similar vector regions regardless of being expressed in Hindi, Python, or LaTeX.
- Pattern is consistent across diverse transformer architectures, suggesting a convergent, geometric solution for representing meaning, not a linguistic one.
Why It Matters
This mechanistic insight into AI 'thinking' could lead to more interpretable, efficient, and multilingual models by targeting the conceptual layer.