Brain2Text decodes fMRI signals into captions without any visual training data, achieving state-of-the-art semantic accuracy?

Brain2Text decodes fMRI signals into captions without any visual training data, achieving state-of-the-art semantic accuracy

Neuroanatomical analysis identifies MT+ complex, ventral stream, and inferior parietal cortex as key regions for visual semantics?

Neuroanatomical analysis identifies MT+ complex, ventral stream, and inferior parietal cortex as key regions for visual semantics

Category-specific analysis reveals distinct neural representations for animacy and motion, deepening understanding of semantic organization?

Category-specific analysis reveals distinct neural representations for animacy and motion, deepening understanding of semantic organization

Research & Papers

Brain2Text decodes fMRI into captions with state-of-the-art accuracy

arXiv q-bio.NC June 09, 2026

⚡New AI reads your visual experiences and describes them in text

Deep Dive

Feihan Feng and Jingxin Nie have introduced Brain2Text, a novel deep learning framework that directly decodes fMRI signals into textual descriptions of visual stimuli. Unlike previous approaches, Brain2Text is trained without any visual information—it learns solely from paired brain activity and text captions. Despite this constraint, it achieves state-of-the-art semantic decoding performance, generating accurate, meaningful captions that capture the core semantic content of complex natural images. The model's architecture and training paradigm provide a more direct and interpretable bridge between brain activity and language, bypassing the need for pixel-level reconstruction.

Beyond performance, Brain2Text offers key neuroscientific insights. Neuroanatomical localization revealed that higher-level visual cortices—including the MT+ complex, ventral stream visual cortex, and inferior parietal cortex—are critical for visual semantic processing. Category-specific analyses further showed nuanced neural representations for semantic dimensions like animacy and motion. This framework not only advances brain-computer interface research but also opens new avenues for understanding the distributed semantic network in the human brain, with potential to inspire more efficient, brain-inspired language models that leverage biological principles of representation.

Key Points

Brain2Text decodes fMRI signals into captions without any visual training data, achieving state-of-the-art semantic accuracy
Neuroanatomical analysis identifies MT+ complex, ventral stream, and inferior parietal cortex as key regions for visual semantics
Category-specific analysis reveals distinct neural representations for animacy and motion, deepening understanding of semantic organization

Why It Matters

Bridges neuroscience and AI to decode visual semantics, advancing brain-computer interfaces and brain-inspired language models.

Read Original Article

Brain2Text decodes fMRI into captions with state-of-the-art accuracy

Why It Matters

Related Articles

Stay Ahead in AI