IT-MLLMs achieved 9% higher brain alignment than ICL models and 20% higher than unimodal baselines?

IT-MLLMs achieved 9% higher brain alignment than ICL models and 20% higher than unimodal baselines.

ICL models show strong semantic organization (r=0.78) while IT models show weak coupling to instruction semantics (r=0.14)?

ICL models show strong semantic organization (r=0.78) while IT models show weak coupling to instruction semantics (r=0.14).

Task-specific instructions create distinct neural representations across brain regions, enabling more biologically aligned AI?

Task-specific instructions create distinct neural representations across brain regions, enabling more biologically aligned AI.

Research & Papers

Instruction-Tuned MLLMs Align Closer to Human Brain Activity Than Unimodal Models

arXiv q-bio.NC May 21, 2026

⚡IT-MLLMs predict fMRI responses 9% better than ICL models during movie watching.

Deep Dive

A new study on arXiv (2506.08277) by Oota et al. probes how instruction-tuned multimodal large language models (IT-MLLMs) align with human brain activity. Using fMRI recordings from participants watching naturalistic movie clips (video with audio), the team extracted representations from six video and two audio IT-MLLMs under 13 task instructions. They compared brain alignment — how well model representations predict voxel-wise fMRI responses — against several baselines: in-context learning (ICL) multimodal models, non-instruction-tuned multimodal models, and unimodal models (text-only or vision-only).

The results show a clear hierarchy: IT-MLLMs achieved the highest brain alignment — ~9% better than ICL models, ~15% better than non-tuned multimodal, and ~20% better than unimodal baselines. Interestingly, ICL models exhibited strong semantic organization (Pearson r=0.78 with instruction-text embeddings), while IT models showed weak coupling to instruction semantics (r=0.14). This dissociation suggests that instruction tuning creates task-conditioned subspaces in the model's representational space, which align more closely with how the brain processes naturalistic stimuli. The findings open new avenues for mapping joint information processing between AI and biological neural systems.

Key Points

IT-MLLMs achieved 9% higher brain alignment than ICL models and 20% higher than unimodal baselines.
ICL models show strong semantic organization (r=0.78) while IT models show weak coupling to instruction semantics (r=0.14).
Task-specific instructions create distinct neural representations across brain regions, enabling more biologically aligned AI.

Why It Matters

This research reveals how instruction-tuned models mirror brain function, guiding the development of more biologically plausible AI systems.

Read Original Article

Instruction-Tuned MLLMs Align Closer to Human Brain Activity Than Unimodal Models

Why It Matters

Related Articles

🚀 Stay Ahead in AI