Study: VLMs and LAMs Outperform RL in Brain Encoding During Gameplay
Vision-language and action models align with human brain activity better than reinforcement learning agents.
A new study by Subba Reddy Oota and colleagues (arXiv:2605.19352) investigates how AI models' internal representations align with human brain activity during naturalistic gameplay. Using fMRI recordings from participants playing Atari-style video games, they compared vision-language models (VLMs) and large-action models (LAMs) against standard reinforcement learning (RL) baselines. Both VLM and LAM families significantly outperformed RL in voxel-wise encoding performance, even when feature dimensionality was matched. The researchers used action-focused and reasoning-focused prompts to probe how each model represents the task.
Critical differences emerged in how prompts affected brain alignment. Gains scaled with the cortical processing hierarchy: frontal-parietal and motor-planning regions improved the most, while early visual cortex saw roughly half that improvement. Variance partitioning revealed a fundamental organizational divergence: VLMs were prompt-symmetric (12.5% unique action vs. 13.6% unique reasoning), whereas LAMs were prompt-asymmetric (27% unique action vs. -5% unique reasoning), with the asymmetry strongest in frontal-motor cortex. These results demonstrate that action-specialized fine-tuning reorganizes multimodal representations toward action-relevant neural computations, even when whole-brain prediction accuracy is equivalent. The work bridges neuroscience and AI, offering insights into how models learn to plan and act.
- VLMs and LAMs outperform RL baselines in voxel-wise brain encoding during gameplay, even with matched feature dimensionality.
- Prompt-driven gains scale with cortical hierarchy – largest in frontal-parietal/motor regions, half as much in early visual cortex.
- Variance partitioning shows VLM prompt-symmetric (12.5% vs 13.6%), LAM prompt-asymmetric (27% vs -5%), especially in frontal-motor cortex.
Why It Matters
Shows how AI models' reasoning and action representations map to human brain activity, advancing interpretability and neuro-AI alignment.