LSFormer beats spiking transformers with 8.6% accuracy boost on neuromorphic data
New local attention mechanism cuts global self-attention bottleneck in spiking neural nets by 8.6%
Transformer-based Spiking Neural Networks (SNNs) have shown strong results but suffer from two key bottlenecks: max pooling loses regional feature information, and global self-attention incurs quadratic computational complexity, contradicting SNNs' energy efficiency. To address this, Lingdong Li and colleagues from the research community developed LSFormer (Local Structure-Aware Spiking Transformer), a novel architecture that introduces Spiking Response Pooling (SPooling) to better capture representative features and Local Structure-Aware Spiking Self-Attention (LS-SSA) to reduce redundancy.
LSFormer employs a local dilated window mechanism that balances local details with long-range dependencies, marking the first time such an approach has been applied to spiking transformers. The results are striking: LSFormer achieves top-1 classification accuracy improvements of 4.3% on the challenging Tiny-ImageNet static dataset and 8.6% on the neuromorphic N-CALTECH101 dataset over previous state-of-the-art SNN models. This work demonstrates that spiking AI can match or exceed traditional deep learning efficiency while maintaining high accuracy, making it highly relevant for edge computing and low-power vision applications.
- LSFormer introduces a local dilated window self-attention that captures both local details and long-range dependencies, reducing computational redundancy.
- Spiking Response Pooling (SPooling) replaces standard max pooling to comprehensively preserve representative regional features.
- Outperforms state-of-the-art spiking transformers by 4.3% on Tiny-ImageNet and 8.6% on N-CALTECH101.
Why It Matters
Breaks the efficiency vs. accuracy trade-off in spiking neural networks for practical energy-efficient vision AI at the edge.