LLMs and Human Brains Share Emotional Valence Axis, Study Finds
A single direction in LLM representations maps directly to human neural activity across 123 subjects.
A new paper from Yousef A. Radwan and colleagues at KAUST presents compelling evidence that large language models (LLMs) and the human brain share a common axis for emotional valence. Using just nine emotion-evocative sentences, the researchers constructed a one-dimensional valence direction (the V-axis) from modern LLMs. This axis demonstrated strong zero-shot transfer to sentiment benchmarks and remained consistent across 14 different LLMs, suggesting a universal representational structure.
The study then showed that this LLM-derived V-axis maps directly onto human neural activity. In a public EEG dataset of 123 subjects watching affective videos, a single linear projection on EEG features could accurately track the V-axis position of each stimulus. Notably, 36 EEG emotion classifiers trained independently without exposure to the V-axis spontaneously rediscovered the same direction in their internal representations, indicating that the valence structure emerges naturally in both language models and human electrophysiology.
However, the researchers discovered that this convergence does not translate into an effective training signal. They tested 25 alignment strategies—including knowledge distillation, representational similarity, contrastive, and topographic losses—and found that none improved decoding accuracy, while 16 significantly reduced it. They formalized this as the 'saturation regularity': once task labels alone drive a brain-decoding network onto the target direction, additional supervision mainly distorts an already-saturated basin, while the load-bearing within-class residual receives little useful gradient.
Motivated by this insight, the team developed an ensemble approach that leverages residual diversity rather than further supervising the saturated basin. This method improved balanced accuracy by 10.5% over the prior best on the FACED dataset, with consistent results on SEED-V. The findings challenge common practices in brain-decoding and suggest that alignment between LLMs and human neural representations is more about shared structure than active supervision.
- V-axis derived from 9 sentences generalizes across 14 LLMs and maps to EEG in 123 subjects
- Saturation regularity: 25 alignment strategies tested, 16 hurt accuracy, none helped
- Ensembling across residual diversity boosts balanced accuracy by 10.5% on FACED dataset
Why It Matters
LLMs can model human emotion representations, opening doors to better brain-computer interfaces and AI alignment research.