MIRAGE model reconstructs mental imagery from fMRI with SOTA accuracy
Researchers decode your mind's eye using brain scans and diffusion models.
In an analysis of the NSD-Imagery dataset, researchers found that while some modern vision decoders work well for mental image reconstruction, others fail—and top performance on seen images doesn't guarantee success on mental imagery. To address this, they developed MIRAGE (Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery). MIRAGE employs a linear backbone combined with multi-modal text and image features as input to a diffusion model. This architecture explicitly targets cross-decoding from brain activity to internally generated visual content.
Feature metrics and human raters confirm MIRAGE as state-of-the-art on the NSD-Imagery benchmark. Ablation analysis reveals that mental image reconstruction performs best when decoders use relatively low-dimensional image features and incorporate guidance from both text descriptions and high- and low-level image features. The work demonstrates that—given the right architecture—existing large-scale datasets collected using external visual stimuli can serve as effective training data for decoding mental images. This opens the door to practical applications in brain-computer interfaces and neuroscientific research.
- MIRAGE uses a linear backbone and multi-modal text+image features fed into a diffusion model to reconstruct mental imagery from fMRI.
- Achieves state-of-the-art performance on the NSD-Imagery benchmark, surpassing other vision decoders.
- Ablation shows best results with low-dimensional image features and combined text and multi-level image guidance.
Why It Matters
Decoding internal mental imagery from brain scans enables new brain-computer interfaces and deeper understanding of visual imagination.