Research & Papers

Where to Bind Matters: Hebbian Fast Weights in Vision Transformers for Few-Shot Character Recognition

Brain-like fast learning boosts vision transformers with just one module placement.

Deep Dive

Standard transformer architectures learn fixed slow-weight representations during training and lack mechanisms for rapid adaptation within an episode. In contrast, biological neural systems address this through fast synaptic updates that form transient associative memories during inference, a property known as Hebbian plasticity. In a new arXiv paper, Gavin Money and colleagues conduct an empirical study of Hebbian Fast-Weight (HFW) modules integrated into multiple transformer backbones, including ViT-Small, DeiT-Small, and Swin-Tiny. They evaluate six model variants on 5-way 1-shot and 5-way 5-shot classification tasks using the Omniglot benchmark under a Prototypical Network meta-learning framework.

The researchers propose a single module placement strategy for Swin-Tiny in which one HFW module is applied to the final stage feature map after all hierarchical stages have completed. This design avoids the training instability caused by placing separate Hebbian modules at each stage and achieves the highest test accuracy across all six models: 96.2% at 1-shot and 99.2% at 5-shot, outperforming the non-Hebbian baseline by +0.3 percentage points at 1-shot. The study further analyzes the interaction between Swin's shifted window inductive bias and episode-level Hebbian binding, discusses why per-block placement fails for ViT and DeiT variants in a low-data regime, and situates the results within the wider literature on fast and slow-weight meta-learning.

Key Points
  • Single HFW module at Swin-Tiny's final stage yields 96.2% 1-shot, 99.2% 5-shot accuracy on Omniglot.
  • Per-block placement in ViT/DeiT causes training instability in low-data regime; Swin's shifted window bias mitigates this.
  • HFW modules enable rapid episode-level adaptation without retraining, simulating biological Hebbian plasticity.

Why It Matters

Enables vision models to learn new classes from few examples, reducing retraining costs in deployment.