SIMON: Saliency-aware Integrative Multi-view Object-centric Neural Decoding
EEG-to-image retrieval hits new SOTA by mimicking human saliency and foveation.
A team of researchers from National Yang Ming Chiao Tung University has developed SIMON (Saliency-aware Integrative Multi-view Object-centric Neural Decoding), a novel framework for zero-shot EEG-to-image retrieval. The work, published on arXiv, addresses a fundamental flaw in prior EEG decoding methods: they assume a fixed, center-focused view of visual stimuli, which conflicts with how humans actually pay attention to salient objects. SIMON combines foreground segmentation and saliency prediction to dynamically select fixation points through Saliency-Aware Sampling (SAS), then generates multiple foveated views that emphasize informative object regions while suppressing background clutter.
On the THINGS-EEG benchmark, SIMON achieves state-of-the-art performance with a Top-1 accuracy of 69.7% in intra-subject settings and 19.6% in inter-subject settings, consistently beating recent competitive baselines. The framework robustly handles different sampling granularities, EEG channel topologies, and visual/brain encoder backbones. The code and pretrained models are publicly available, enabling further research into brain-computer interfaces and neural decoding for visual perception.
- SIMON achieves 69.7% Top-1 accuracy on THINGS-EEG (intra-subject), a new state-of-the-art
- Uses Saliency-Aware Sampling (SAS) to select fixation centers, mimicking human visual attention
- Open-source release of code and models for zero-shot EEG-to-image retrieval
Why It Matters
Moves brain-computer interfaces closer to practical visual decoding by accounting for real human attention patterns.