Brain2Text decodes fMRI into captions with state-of-the-art accuracy
New AI reads your visual experiences and describes them in text
Feihan Feng and Jingxin Nie have introduced Brain2Text, a novel deep learning framework that directly decodes fMRI signals into textual descriptions of visual stimuli. Unlike previous approaches, Brain2Text is trained without any visual information—it learns solely from paired brain activity and text captions. Despite this constraint, it achieves state-of-the-art semantic decoding performance, generating accurate, meaningful captions that capture the core semantic content of complex natural images. The model's architecture and training paradigm provide a more direct and interpretable bridge between brain activity and language, bypassing the need for pixel-level reconstruction.
Beyond performance, Brain2Text offers key neuroscientific insights. Neuroanatomical localization revealed that higher-level visual cortices—including the MT+ complex, ventral stream visual cortex, and inferior parietal cortex—are critical for visual semantic processing. Category-specific analyses further showed nuanced neural representations for semantic dimensions like animacy and motion. This framework not only advances brain-computer interface research but also opens new avenues for understanding the distributed semantic network in the human brain, with potential to inspire more efficient, brain-inspired language models that leverage biological principles of representation.
- Brain2Text decodes fMRI signals into captions without any visual training data, achieving state-of-the-art semantic accuracy
- Neuroanatomical analysis identifies MT+ complex, ventral stream, and inferior parietal cortex as key regions for visual semantics
- Category-specific analysis reveals distinct neural representations for animacy and motion, deepening understanding of semantic organization
Why It Matters
Bridges neuroscience and AI to decode visual semantics, advancing brain-computer interfaces and brain-inspired language models.