Image & Video

STAMBRIDGE reads your mind: EEG-to-image with 65.95% Top-5 accuracy

A new framework achieves 34.5% Top-1 zero-shot retrieval of images from brain signals.

Deep Dive

A team of researchers from multiple institutions has introduced STAMBRIDGE, a novel two-stage framework for decoding visual experiences from EEG signals. The first stage, Spectral-Temporal Amplitude-aware Modulation (STAM), replaces traditional hard frequency masking with amplitude-derived soft channel weighting combined with multi-scale temporal convolutions. This preserves frequency-aware transients while reducing time-domain ringing artifacts, producing well-conditioned EEG representations. The second stage, Mid-Feature Semantic Bridge (MFSB), constructs a regularized intermediate space through directed cross-modal interactions, enabling staged distillation and more stable semantic alignment between EEG features and vision-language spaces.

On the THINGS-EEG benchmark, STAMBRIDGE achieves competitive 200-way zero-shot retrieval performance: 34.50% Top-1 and 65.95% Top-5 accuracy. Furthermore, the learned embeddings can be fed into a diffusion model to reconstruct semantically coherent images from brain activity, demonstrating robust EEG-to-vision semantic alignment. The code is publicly available on arXiv. This work could unlock new applications in brain-computer interfaces and AI-assisted communication for paralyzed patients.

Key Points
  • STAM uses amplitude-derived soft channel weighting and multi-scale convs instead of hard frequency masking, reducing ringing artifacts.
  • MFSB constructs a regularized intermediate space for stable cross-modal alignment via directed interactions.
  • Achieved 34.50% Top-1 and 65.95% Top-5 zero-shot retrieval on 200-way THINGS-EEG, plus diffusion-based image reconstruction.

Why It Matters

Enables non-invasive brain reading for BCI, with potential to assist communication for locked-in patients.