TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants
A new Mamba-based model achieves linear computational complexity, outperforming Transformer baselines for hearing aid users.
Researchers Hsin-Tien Chiang and John H. L. Hansen have introduced TokenSE, a novel speech enhancement framework specifically designed for cochlear implant (CI) users. The system operates by converting degraded speech into a sequence of discrete tokens from a neural audio codec. Its core innovation is using a Mamba-based model, a state-space model architecture, to predict the indices of 'clean' tokens from this noisy input. This architectural choice is critical because, unlike the Transformer models commonly used in AI, Mamba's input-dependent selection mechanism scales with linear computational complexity relative to sequence length. This makes it far more efficient and practical for real-time processing on the constrained hardware typical of hearing aids and implants.
The research demonstrates that TokenSE consistently outperforms existing baseline methods in objective evaluations across both in-domain and out-of-domain datasets. More importantly, subjective listening tests conducted with actual cochlear implant users showed a clear and measurable benefit. Participants experienced significantly improved speech intelligibility in challenging acoustic environments filled with noise and reverberation. By working directly in the compressed token space of a neural codec and leveraging Mamba's efficiency, TokenSE presents a promising path toward deploying powerful, on-device AI for auditory assistance without the prohibitive computational cost of Transformer-based models.
- Uses a Mamba-based model for linear-time processing, avoiding the quadratic complexity of Transformer self-attention.
- Operates in a discrete neural audio codec token space, predicting clean tokens from degraded speech input.
- Subjective tests with cochlear implant users confirm improved speech intelligibility in noisy, reverberant conditions.
Why It Matters
This enables more efficient, real-time AI hearing assistance that can significantly improve life for cochlear implant users in everyday noisy settings.