VLMs and LAMs outperform RL baselines in voxel-wise brain encoding during gameplay, even with matched feature dimensionality?

VLMs and LAMs outperform RL baselines in voxel-wise brain encoding during gameplay, even with matched feature dimensionality.

Prompt-driven gains scale with cortical hierarchy – largest in frontal-parietal/motor regions, half as much in early visual cortex?

Prompt-driven gains scale with cortical hierarchy – largest in frontal-parietal/motor regions, half as much in early visual cortex.

Variance partitioning shows VLM prompt-symmetric (12.5% vs 13.6%), LAM prompt-asymmetric (27% vs -5%), especially in frontal-motor cortex?

Variance partitioning shows VLM prompt-symmetric (12.5% vs 13.6%), LAM prompt-asymmetric (27% vs -5%), especially in frontal-motor cortex.

Research & Papers

Study: VLMs and LAMs Outperform RL in Brain Encoding During Gameplay

arXiv q-bio.NC May 20, 2026

⚡Vision-language and action models align with human brain activity better than reinforcement learning agents.

Deep Dive

A new study by Subba Reddy Oota and colleagues (arXiv:2605.19352) investigates how AI models' internal representations align with human brain activity during naturalistic gameplay. Using fMRI recordings from participants playing Atari-style video games, they compared vision-language models (VLMs) and large-action models (LAMs) against standard reinforcement learning (RL) baselines. Both VLM and LAM families significantly outperformed RL in voxel-wise encoding performance, even when feature dimensionality was matched. The researchers used action-focused and reasoning-focused prompts to probe how each model represents the task.

Critical differences emerged in how prompts affected brain alignment. Gains scaled with the cortical processing hierarchy: frontal-parietal and motor-planning regions improved the most, while early visual cortex saw roughly half that improvement. Variance partitioning revealed a fundamental organizational divergence: VLMs were prompt-symmetric (12.5% unique action vs. 13.6% unique reasoning), whereas LAMs were prompt-asymmetric (27% unique action vs. -5% unique reasoning), with the asymmetry strongest in frontal-motor cortex. These results demonstrate that action-specialized fine-tuning reorganizes multimodal representations toward action-relevant neural computations, even when whole-brain prediction accuracy is equivalent. The work bridges neuroscience and AI, offering insights into how models learn to plan and act.

Key Points

VLMs and LAMs outperform RL baselines in voxel-wise brain encoding during gameplay, even with matched feature dimensionality.
Prompt-driven gains scale with cortical hierarchy – largest in frontal-parietal/motor regions, half as much in early visual cortex.
Variance partitioning shows VLM prompt-symmetric (12.5% vs 13.6%), LAM prompt-asymmetric (27% vs -5%), especially in frontal-motor cortex.

Why It Matters

Shows how AI models' reasoning and action representations map to human brain activity, advancing interpretability and neuro-AI alignment.

Read Original Article

Study: VLMs and LAMs Outperform RL in Brain Encoding During Gameplay

Why It Matters

Related Articles

🚀 Stay Ahead in AI