Introduced 'target vs. off-target' probe area metric to quantify steering selectivity and identify three operational regimes?

Introduced 'target vs. off-target' probe area metric to quantify steering selectivity and identify three operational regimes

Discovered 'wrecking-ball' interventions and age-pathology confounding that corrupt model performance when manipulating concepts?

Discovered 'wrecking-ball' interventions and age-pathology confounding that corrupt model performance when manipulating concepts

Research & Papers

Sparse Autoencoders crack open EEG foundation models for clinical trust

arXiv cs.LG May 15, 2026

⚡New method reveals hidden representations in three major EEG transformers...

Deep Dive

A multi-institutional team led by William Lehn-Schiøler applied TopK Sparse Autoencoders (SAEs) to three leading EEG foundation models—SleepFM, REVE, and LaBraM—to shine light on their internal workings. The SAEs extract sparse feature dictionaries from model embeddings, then map these features to clinical concepts like abnormality, age, sex, and medication usage. This grounding allows researchers to benchmark monosemanticity (one feature = one concept) and entanglement across architectures. A single hyperparameter procedure, driven by an intrinsic dictionary health audit, robustly transfers across all three models, suggesting a universal approach to interpreting EEG AI.

The team introduced a 'target vs. off-target' probe area metric to quantify steering selectivity—how precisely a concept can be manipulated without affecting others. This revealed three operational regimes: selectively steerable, encoded but entangled, and non-encoded. Critically, they exposed 'wrecking-ball' interventions that collapse global model performance, and clinical entanglements such as age-pathology confounding, where suppressing one concept inevitably corrupts another. Finally, a spectral decoder translates these interventions into amplitude spectrum changes—e.g., pathological slow-wave suppression and alpha-band restoration—making latent manipulations physiologically interpretable. This framework is a major step toward trustworthy AI in neurology and psychiatry.

Key Points

Applied TopK Sparse Autoencoders to three distinct EEG transformers: SleepFM, REVE, and LaBraM
Introduced 'target vs. off-target' probe area metric to quantify steering selectivity and identify three operational regimes
Discovered 'wrecking-ball' interventions and age-pathology confounding that corrupt model performance when manipulating concepts

Why It Matters

Makes EEG AI transparent for clinical use, enabling safer diagnosis and treatment planning.

Read Original Article

Sparse Autoencoders crack open EEG foundation models for clinical trust

Why It Matters

Related Articles

🚀 Stay Ahead in AI