Growing AI networks fails where pruning succeeds—new study explains why
New neurons suffer 'backward starvation' in deep learning, researchers find.
A new paper from Lillo & Cheney challenges the assumption that growing neural networks during training is a symmetric counterpart to pruning. While pruning removes already-trained units, growth inserts brand-new units into an already specialized optimization trajectory. These newborns are 'forward-active but backward-starved': they compute outputs but receive far weaker gradient updates than established neurons. This disadvantage is minor in simple MLP benchmarks but becomes critical in convolutional image classification tasks.
In those harder settings, pruning achieves higher accuracy when performance is averaged over the full training trajectory or when the final sparse network is retrained from scratch. Growth can match pruning only in continual-learning benchmarks where plasticity loss is a factor—and only if new units have enough time to integrate their gradients. Interventions like optimizer state copying or better initialization help but don't fully close the gap. The authors argue that growth must be evaluated as a time-sensitive optimization process, not merely an architecture-search operator.
- Newborn units in neural networks receive significantly weaker gradient signals than existing units ('backward starvation').
- In convolutional image classification, pruning outperforms growth when averaged over training trajectory or after retraining from scratch.
- Growth becomes competitive with pruning only in continual learning settings, given sufficient integration time for new units.
Why It Matters
This reframes how we design adaptive AI systems—growth needs fundamentally different training strategies than pruning.