Bottleneck Transformer-Based Approach for Improved Automatic STOI Score Prediction
New AI model assesses speech quality without needing a clean reference audio file.
Researchers led by Murali Kadambi developed a novel Bottleneck Transformer model for predicting the STOI (Short-Time Objective Intelligibility) score. The architecture combines convolutional blocks for frame-level features with multi-head self-attention (MHSA) to aggregate key information. It outperforms state-of-the-art models, achieving higher correlation and lower mean squared error for both seen and unseen scenarios, enabling more accurate, non-intrusive assessment of speech quality in real-world, noisy conditions.
Why It Matters
Improves automated testing for hearing aids, voice assistants, and telecom systems by accurately measuring speech clarity in noisy environments.