VLMs showed no global advantage over LLMs in aligning with human brain activity during natural reading?

VLMs showed no global advantage over LLMs in aligning with human brain activity during natural reading

Selective VLM advantage emerged only for sentences with strong visual semantic content?

Selective VLM advantage emerged only for sentences with strong visual semantic content

Study used whole-cortex fMRI and eye-tracking data to compare model-human alignment?

Study used whole-cortex fMRI and eye-tracking data to compare model-human alignment

Research & Papers

Study: VLMs don't uniformly beat LLMs in mimicking human reading

arXiv q-bio.NC May 28, 2026

⚡Multimodal training doesn't guarantee more human-like text processing, new study finds

Deep Dive

A team of researchers led by Jinzhou Wu compared large language models (LLMs) and vision-language models (VLMs) to test whether multimodal pretraining makes text representations more human-like during natural reading. They used tightly matched model pairs under a strictly text-only setting to isolate the effect of multimodal training history from online visual input. Human alignment was measured using whole-cortex fMRI responses and synchronized eye-tracking saccades from a natural reading dataset.

Results show that multimodal pretraining does not confer a uniform, global advantage. VLMs only outperformed LLMs when sentences contained stronger visual semantic content, and this effect was visible in both brain activity and eye movement patterns. The findings challenge the assumption that adding vision to language models automatically improves their alignment with human cognition for reading tasks.

Key Points

VLMs showed no global advantage over LLMs in aligning with human brain activity during natural reading
Selective VLM advantage emerged only for sentences with strong visual semantic content
Study used whole-cortex fMRI and eye-tracking data to compare model-human alignment

Why It Matters

Challenges the assumption that multimodal models are inherently more human-like, guiding model design for language understanding.

Read Original Article

Study: VLMs don't uniformly beat LLMs in mimicking human reading

Why It Matters

Related Articles

🚀 Stay Ahead in AI