Structure-Guided Diffusion Model for EEG-Based Visual Cognition Reconstruction
Researchers decode complex visual imagery from brain waves with 2-stage diffusion model...
A team led by Yongxiang Lian and colleagues has introduced the Structure-Guided Diffusion Model (SGDM), a breakthrough framework for reconstructing visual imagery directly from EEG brain signals. Unlike prior methods that were limited to natural images or categorical outputs, SGDM explicitly incorporates structural geometry—like edges, shapes, and object boundaries—into the generation process. The model uses a two-stage approach: first, a structurally supervised variational autoencoder (VAE) extracts high-level visual structure, paired with a spatiotemporal EEG encoder that aligns brain signals to a visual embedding space via contrastive learning. Then, these structural features are fed into a ControlNet-guided diffusion model to generate images that match what a person is seeing or imagining.
Tested on the abstract visual object dataset Kilogram and the natural image dataset THINGS, SGDM outperformed existing methods in both low-level visual fidelity (e.g., texture, brightness) and high-level semantic accuracy (e.g., object identity, scene context). The researchers also performed spatiotemporal analysis of EEG signals, revealing hierarchical structural encoding patterns that align with known neural dynamics of visual cognition. This suggests SGDM isn't just a black box—it's grounded in how the brain actually processes visual information. The work extends brain-computer interfaces (BCIs) beyond simple commands or category labels, opening the door to decoding complex visual thoughts for applications like communication aids for paralyzed patients or next-generation neural prosthetics.
- SGDM uses a two-stage generative mechanism: structural VAE + EEG encoder aligned via contrastive learning, then ControlNet-guided diffusion
- Outperforms prior methods on both abstract (Kilogram) and natural (THINGS) datasets, with higher fidelity in low-level visual features and semantics
- Spatiotemporal EEG analysis reveals hierarchical structural encoding patterns consistent with neural dynamics of visual cognition
Why It Matters
Enables BCIs to decode complex visual thoughts, not just categories—boosting degrees of freedom for intention decoding.