Applies mechanistic interpretability tools to neural encoding, moving beyond correlational black-box models?

Applies mechanistic interpretability tools to neural encoding, moving beyond correlational black-box models

Generates per-voxel functional profiles with semantically interpretable descriptions of critical image features?

Generates per-voxel functional profiles with semantically interpretable descriptions of critical image features

Validates via counterfactual editing?

inserting/removing predicted features shifts voxel activation as expected

Research & Papers

MINE Framework Decodes Brain's Visual Cortex with Mechanistic Interpretability

arXiv q-bio.NC May 19, 2026

⚡New AI method reveals fine-grained voxel-level visual features driving neural activity...

Deep Dive

A team led by Idan Daniel Grosbard, Mor Geva, and Galit Yovel has introduced Mechanistically Interpretable Neural Encoding (MINE), a framework that opens the black box of neural encoding models. Traditional approaches predict cortical responses to natural images using artificial neural networks but remain correlational—they can't pinpoint which specific image features drive each millimeter-scale voxel's activity. MINE applies mechanistic interpretability tools to language-aligned image representations (e.g., CLIP-like embeddings) to localize critical features per voxel. It then generalizes these per-image attributions into per-voxel functional profiles—semantically interpretable descriptions of what that voxel is 'looking for'. This moves beyond coarse category selectivity (e.g., face vs. place areas) to fine-grained functional selectivity within those regions.

To validate, the researchers showed that the per-image descriptions are sufficient to generate synthetic images that elicit voxel responses matching those of the original images—outperforming random or low-attribution controls. More compellingly, counterfactually inserting or removing the predicted features from images shifts activation in the expected direction, providing causal evidence. Per-voxel activation profiles produced even stronger shifts when used for editing, indicating they faithfully capture each voxel's selectivity. Applied to well-studied category-selective brain regions (e.g., fusiform face area), MINE recovered known categorical preferences while revealing unique voxel-level structure. The work establishes mechanistic interpretability as a concrete path to discover and causally validate fine-grained hypotheses about neural function, bridging AI and neuroscience.

Key Points

Applies mechanistic interpretability tools to neural encoding, moving beyond correlational black-box models
Generates per-voxel functional profiles with semantically interpretable descriptions of critical image features
Validates via counterfactual editing: inserting/removing predicted features shifts voxel activation as expected

Why It Matters

Bridges AI interpretability and neuroscience, enabling causal validation of how visual cortex processes images.

Read Original Article

MINE Framework Decodes Brain's Visual Cortex with Mechanistic Interpretability

Why It Matters

Related Articles

🚀 Stay Ahead in AI