Untrained CNNs Match Backpropagation at V1: A Systematic RSA Comparison of Four Learning Rules Against Human fMRI
New research shows a CNN's architecture, not its training, dictates how well it mimics the human brain's early vision.
A new computational neuroscience study by researcher Nils Leutenegger challenges assumptions about how AI models learn to see like humans. The research presents a systematic comparison of four learning rules—backpropagation (BP), feedback alignment (FA), predictive coding (PC), and spike-timing-dependent plasticity (STDP)—in identical convolutional neural networks (CNNs). Using Representational Similarity Analysis (RSA) against human fMRI data from the THINGS-fMRI dataset (720 stimuli, 3 subjects), the study included a crucial baseline: an untrained CNN with random weights.
The most striking result is that this untrained model achieved a similarity score (rho = 0.071) statistically indistinguishable from a fully trained BP model (rho = 0.072) when aligned with the early visual cortex (V1/V2). This demonstrates that the network's architecture, not its learned weights, is the primary driver of early visual representation alignment with the human brain. The influence of learning rules only becomes significant in higher visual areas like the lateral occipital complex (LOC) and inferior temporal cortex (IT), where BP and PC with local updates showed superior alignment.
Conversely, the study found that feedback alignment (FA) consistently impaired representations, performing below the random baseline at V1. These effects held even after controlling for low-level pixel similarity. The findings offer a nuanced, region-specific map of how different AI training methods relate to biological vision, separating the contributions of innate structure from learned experience.
- Untrained CNNs with random weights matched backpropagation's alignment with human V1 cortex (rho=0.071 vs. 0.072, p=0.43), proving early vision alignment is architecture-driven.
- Learning rules only differentiated performance in higher visual areas (LOC/IT), with backpropagation and predictive coding achieving the best alignment.
- Feedback alignment (FA) consistently impaired model representations, performing worse than the random baseline in early visual areas.
Why It Matters
This reframes AI-brain alignment research, emphasizing that a model's core design may be as important as its training for replicating early human vision.