Research & Papers

Brain models miss cognition: third-order stats beat billion-parameter BFMs

650M-parameter BrainLM predicts cognition worse than an 80K linear model.

Deep Dive

A new study by Marraffini et al. reveals a critical flaw in brain foundation models (BFMs) – self-supervised Transformers pretrained on fMRI data. Across three state-of-the-art BFMs (including BrainLM with 650M and 111M parameters) and every readout tested, cognition prediction is worse than a linear regression using the functional connectivity matrix (FC) with only ~80K parameters. The gap widens with scale: larger models predict cognition worse than smaller ones, contradicting the assumption that bigger is better.

The root cause is a 'variance allocation problem': BFM pretraining captures the dominant variance in fMRI (second-order covariance) but destroys the higher-order structure – specifically third-order co-skewness – that actually predicts cognition. The authors propose a linear pipeline that projects fMRI into the subspace best preserving co-skewness, then computes FC there. This simple method exceeds raw FC and all pretrained BFMs on every dataset and parcellation tested, with no pretraining and no GPU. They also recover BrainLM's performance ceiling by finetuning with a loss targeted at the co-skewness subspace, proving the architecture is not the bottleneck – the pretraining objective is.

Key Points
  • BrainLM 650M predicts cognition worse than its 111M version, and all BFMs are beaten by an 80K linear FC model.
  • The problem is a variance allocation: pretraining preserves second-order covariance but destroys third-order co-skewness that correlates with cognition.
  • A simple linear pipeline maximizing co-skewness outperforms all BFMs; finetuning BFMs with a co-skewness loss matches raw FC performance.

Why It Matters

This undermines the scaling dogma in AI: bigger models can fail if they learn the wrong features.