54.1% Top-1 zero-shot accuracy on Things-EEG2, up from 32.4% baseline?

54.1% Top-1 zero-shot accuracy on Things-EEG2, up from 32.4% baseline

Tri-modal contrastive learning aligns EEG, images, and LLM-generated text?

Tri-modal contrastive learning aligns EEG, images, and LLM-generated text

Generalizes to MEG data and uses compact CN-CLIP embeddings for efficiency?

Generalizes to MEG data and uses compact CN-CLIP embeddings for efficiency

Research & Papers

MindAlign reads brain waves with 54% accuracy in zero-shot visual decoding

arXiv q-bio.NC May 26, 2026

⚡EEG + AI = 83% top-5 accuracy decoding what you're seeing

Deep Dive

Visual decoding from brain signals has long been a challenge at the intersection of computer vision and neuroscience. A new framework called MindAlign, developed by researchers from multiple institutions, tackles this with a tri-modal contrastive approach that aligns EEG, visual, and textual representations in a unified latent space. The two-stage design first pre-trains an EEG encoder via masked reconstruction on unlabeled trials to learn spatio-temporal regularities, then jointly aligns EEG, images, and LLM-generated text descriptions through contrastive learning. Text acts as a semantic regularizer, injecting linguistic structure without overwhelming the primary EEG-image signal. The encoder incorporates subject-specific adaptation, graph-attention over channels, and temporal-spatial convolutional embeddings.

On the Things-EEG2 200-way zero-shot benchmark, MindAlign achieves 54.1% Top-1 and 83.4% Top-5 accuracy—a significant leap from the previous best baseline of 32.4% and 64.0%. Paired Wilcoxon tests confirm significance (p < 0.01) across all in-subject baselines. Analysis reveals that compact embedding geometries (CN-CLIP) outperform much larger backbones, and decoding aligns with established neurophysiology of visual processing. The framework also generalizes to Things-MEG data. This work is a critical step toward robust, semantically-grounded visual decoding from non-invasive temporal neural signals. The source code is publicly available.

Key Points

54.1% Top-1 zero-shot accuracy on Things-EEG2, up from 32.4% baseline
Tri-modal contrastive learning aligns EEG, images, and LLM-generated text
Generalizes to MEG data and uses compact CN-CLIP embeddings for efficiency

Why It Matters

Non-invasive brain-computer interfaces get a major accuracy boost, bringing mind-reading closer to practical use.

Read Original Article

MindAlign reads brain waves with 54% accuracy in zero-shot visual decoding

Why It Matters

Related Articles

🚀 Stay Ahead in AI