Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI
New AI model learns a shared latent space across T1 and T2 MRI scans, enabling better cross-modal analysis.
A research team from Stanford University and the University of Southern California has introduced NeuroQuant, a novel 3D vector-quantized variational autoencoder (VQ-VAE) designed specifically for reconstructing multi-modal brain MRI scans. Unlike previous models that focused on single modalities like T1-weighted MRI, NeuroQuant processes complementary modalities (T1 and T2) simultaneously. Its architecture features a dual-stream 3D encoder that explicitly factorizes the encoding process: one stream captures modality-invariant anatomical structures, while the other handles modality-dependent appearance features. This separation is crucial for learning a robust, shared latent representation.
NeuroQuant employs a factorized multi-axis attention mechanism to capture relationships between distant brain regions within this shared space. The anatomical encoding is then discretized using a shared codebook—a hallmark of VQ-VAEs—before being combined with the modality-specific appearance features via Feature-wise Linear Modulation (FiLM) during decoding. To account for the slice-based nature of 3D MRI acquisition, the model is trained using an innovative joint 2D/3D strategy. Extensive experiments on two multi-modal brain MRI datasets demonstrate that NeuroQuant achieves superior reconstruction fidelity compared to existing VAE approaches.
The model's ability to create a unified, disentangled representation of brain anatomy across different MRI modalities establishes a powerful foundation model for downstream tasks. This enables more accurate medical image synthesis, such as generating one modality from another (e.g., a T2 scan from a T1 scan), and facilitates advanced cross-modal analysis for research and clinical applications, potentially improving diagnostics and treatment planning.
- Uses a dual-stream 3D encoder to separate anatomical structures from modality-specific appearance, enabling a shared latent space across T1 and T2 MRI scans.
- Employs a joint 2D/3D training strategy to account for the slice-based acquisition of real 3D MRI data, improving practical applicability.
- Demonstrates superior reconstruction fidelity on two multi-modal datasets, providing a scalable foundation for generative tasks and cross-modal analysis.
Why It Matters
Enables more accurate synthesis of medical images and robust cross-modal analysis, advancing AI-assisted diagnostics and neuroscience research.