Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning
A new 'top-k goodness' function improves Fashion-MNIST accuracy by 22.6 points, challenging backpropagation's dominance.
Researchers Kamer Ali Yuksel and Hassan Sawaf have published a paper titled 'Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning,' presenting a breakthrough for the Forward-Forward (FF) algorithm. The FF algorithm is a promising, biologically plausible alternative to the ubiquitous backpropagation method, training neural networks layer-by-layer using a local 'goodness' function. The researchers systematically explored the design of this function, moving beyond the standard sum-of-squares approach. Their key innovation is introducing sparsity: their 'top-k goodness' function evaluates only the k most active neurons in a layer, which alone boosted accuracy on the Fashion-MNIST dataset by 22.6 percentage points.
They further refined this with 'entmax-weighted energy,' a learnable sparse weighting system, and a separate technique called FFCL for injecting class information at every layer. The combined approach achieved 87.1% accuracy on Fashion-MNIST with a specific network architecture, representing a massive 30.7 percentage point improvement over the original FF baseline. After testing 11 different goodness functions across multiple architectures, the team identified a consistent principle: adaptive sparsity—focusing on the most informative neurons—is the most critical design choice for making FF networks competitive. This work provides a clear, scalable path to improve a major alternative to backpropagation, potentially unlocking more efficient and brain-like learning in AI systems.
- Introduced 'top-k goodness' for Forward-Forward networks, improving Fashion-MNIST accuracy by 22.6 percentage points over the sum-of-squares baseline.
- Combined with a new label injection method (FFCL), achieved a final accuracy of 87.1%, a 30.7-point total improvement.
- Identified adaptive sparsity (using alpha ≈ 1.5 in entmax) as the key design principle, outperforming both fully dense and fully sparse aggregation methods.
Why It Matters
Provides a major performance leap for a biologically plausible AI training method, challenging the dominance of backpropagation for future efficient systems.