Image & Video

New AI reads your mind: EEG-to-image retrieval hits 86% top-1 accuracy

Researchers decode brain signals to reconstruct images with 98.55% top-5 retrieval accuracy

Deep Dive

A team of researchers from an undisclosed institution (Chi Kit Wong, Yan Liu, Haowen Yan) has published a paper on arXiv demonstrating a system that can decode visual stimuli from EEG signals recorded during natural image viewing. The system tackles two tasks: EEG-to-image retrieval and EEG-to-image reconstruction. For retrieval, they implemented a multi-level blurring approach enhanced with biologically inspired EVNet features and trained with the InfoNCE loss. Evaluated over 10 random seeds for a single subject, the retrieval model achieved a mean final-epoch Top-1 accuracy of 86.30% and Top-5 accuracy of 98.55%, meaning it can almost always find the exact image a person was looking at from a pool of 200 candidates.

For reconstruction, the team built CognitionCapturerPro, which aligns EEG representations to multimodal CLIP embeddings—including image, text, depth, and edge embeddings—and then synthesizes images using SDXL-Turbo conditioned via IP-Adapter. Averaged over 10 seeds, the reconstruction model achieved a CLIP score of 0.903 using ViT-H-14, 0.870 using ViT-L/14, and an SSIM of 0.409. These metrics indicate that the generated images are highly semantically similar to the original stimuli, though fine-grained visual fidelity remains a challenge. The paper includes code availability and 16 pages of results, marking a significant step toward practical brain-computer interfaces that can decode rich visual content directly from neural activity.

Key Points
  • EEG-to-image retrieval achieves 86.30% Top-1 and 98.55% Top-5 accuracy among 200 candidates using multi-level blurring and EVNet features.
  • CognitionCapturerPro reconstructs images by aligning EEG to multimodal CLIP embeddings (image, text, depth, edge) and generating with SDXL-Turbo + IP-Adapter.
  • Reconstruction achieves CLIP score of 0.903 (ViT-H-14) and SSIM of 0.409, proving feasibility of decoding visual representations from EEG signals.

Why It Matters

This opens doors for non-invasive brain-computer interfaces that can reconstruct visual experiences, aiding communication for locked-in patients.