Research & Papers

A foundation model of vision, audition, and language for in-silico neuroscience

arXiv q-bio.NC May 07, 2026

⚡AI model trained on 1,000+ hours of fMRI outperforms traditional encoding by several-fold

Deep Dive

TRIBE v2, developed by Stéphane d'Ascoli and colleagues, represents a breakthrough in cognitive neuroscience by unifying fragmented models into a single tri-modal foundation model. Trained on a massive dataset of over 1,000 hours of functional MRI (fMRI) recordings from 720 subjects, the model processes video, audio, and language inputs to predict brain activity. Unlike traditional linear encoding models that are tailored to specific paradigms, TRIBE v2 delivers several-fold improvements in accuracy for novel stimuli, tasks, and subjects. It leverages a unified framework to capture the complex, nonlinear relationships between sensory inputs and brain responses, enabling predictions at high spatial resolution.

Critically, TRIBE v2 enables in-silico experimentation—simulating neuroscience experiments entirely within the model. Tested on seminal visual and neuro-linguistic paradigms, it reproduces results established by decades of empirical research, validating its ability to model human cognition. By extracting interpretable latent features, the model also maps the fine-grained topography of multisensory integration, revealing how the brain combines vision, audition, and language. This work establishes artificial intelligence as a powerful unifying framework for exploring the functional organization of the human brain, potentially accelerating neuroscience research by reducing the need for costly and time-consuming human experiments.

Key Points

TRIBE v2 trained on over 1,000 hours of fMRI data from 720 subjects across three modalities (video, audio, language).
Delivers several-fold improvement in brain activity prediction accuracy compared to traditional linear encoding models.
Recovers decades of empirical neuroscience results through in-silico experimentation on visual and neuro-linguistic paradigms.

Why It Matters

Unifies fragmented cognitive neuroscience into a single AI model for predicting human brain activity and running virtual experiments.

Read Original Article

A foundation model of vision, audition, and language for in-silico neuroscience

Why It Matters

Stay Ahead in AI