Audio & Speech

Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion

New CRNN model uses speaker count fusion to improve hearing in noisy rooms.

Deep Dive

Researchers from the University of Ottawa and GN Hearing have published a paper on arXiv (2509.21382) demonstrating a deep learning approach to improve direction-of-arrival (DOA) estimation in binaural hearing aids. Their convolutional recurrent neural network (CRNN) model, which leverages spectral phase differences and magnitude ratios between microphone signals, was enhanced by integrating source-count information through a technique called late fusion. This method, which uses the estimated number of active speakers (0, 1, or 2+) as an auxiliary feature, yielded up to 14% higher average F1-scores compared to the baseline CRNN in real-world binaural recordings.

The study also explored dual-task training for joint DOA estimation and source counting, but found it did not improve DOA performance, though it did benefit source-count prediction. The key insight is that using a ground-truth (oracle) source count significantly enhances standalone DOA estimation, particularly in noisy, multi-speaker environments common in everyday hearing aid use. This work, set to appear in IEEE ICASSP 2026, highlights the potential of fusing source-count information for more robust auditory scene analysis in assistive hearing devices.

Key Points
  • CRNN model uses spectral phase differences and magnitude ratios for DOA estimation
  • Late fusion of speaker count information improves F1-score by up to 14%
  • Dual-task training did not improve DOA performance, but enhanced source-count prediction

Why It Matters

Better direction-of-arrival estimation in hearing aids means clearer speech understanding in crowded, noisy environments.