Neuromorphic SpeechMamba Models Achieve 70% Sparsity for Efficient ASR
Spiking neural networks cut computations by 70% with <1% accuracy loss
Deep learning has revolutionized automatic speech recognition (ASR), enabling deployment on edge devices like smartphones and smart homes. However, the computational and energy demands of deep neural networks introduce latency and limit real-time interaction. Neuromorphic computing offers a solution by introducing activation sparsity through spiking neural networks (SNNs) and event-driven neural networks, converting dense operations into sparse computations. In a new paper accepted at IJCNN2026, researchers present spiking and event-driven neuromorphic versions of the state-of-the-art SpeechMamba model, the first study to evaluate hardware benefits of different neuromorphic strategies for ASR.
The team introduced an event-driven SpeechMamba with FATReLU activation, achieving over 60% activation sparsity with less than 1% accuracy degradation on the LibriSpeech benchmark. They also propose a spiking SpeechMamba that attains over 70% sparsity while using 30% fewer parameters than comparable SNNs. To enable real-world deployment, they developed a cycle-accurate event-driven simulator that facilitates flexible algorithm-hardware co-exploration, identifying computational bottlenecks and yielding over 10% additional efficiency improvements. This work demonstrates that neuromorphic approaches can dramatically reduce energy consumption for edge ASR without sacrificing accuracy.
- Event-driven SpeechMamba with FATReLU achieves >60% activation sparsity with <1% accuracy loss on LibriSpeech
- Spiking SpeechMamba attains >70% sparsity while using 30% fewer parameters than comparable SNNs
- Cycle-accurate simulator enables co-optimization, yielding over 10% extra efficiency improvements
Why It Matters
Enables highly efficient, low-latency speech recognition on edge devices, reducing energy costs for real-time AI.