New deep beamformer LC-DBF outperforms classical LCMV in multi-speaker audio
A deep neural network learns to steer nulls and preserve target speakers with linear constraints.
A team led by Ilai Zaidel, Ori Engel, Bar Engel, and Sharon Gannot from Bar-Ilan University has introduced a novel deep beamforming framework called the Linearly Constrained Deep Beamformer (LC-DBF). The method addresses a fundamental challenge in multi-speaker environments: isolating a target speaker while suppressing interference and background noise. Unlike traditional LCMV beamformers that rely on separate covariance estimation and weight calculation, LC-DBF trains a deep neural network (DNN) to directly predict beamforming weights from raw multichannel audio. The training loss is designed using an augmented Lagrangian approach, combining signal reconstruction penalties with hard linear constraints that enforce a distortionless response toward the target direction and force nulls onto the estimated interference subspace. The model is guided by the target relative transfer function (RTF) and the interference subspace estimates, allowing the DNN to learn optimal spatial filtering without explicit separation of noise and interference.
The experimental results demonstrate clear advantages over the classical LCMV beamformer built from the same spatial signatures. LC-DBF achieves superior overall enhancement performance, with more controlled sidelobes and improved background-noise attenuation. By embedding linear constraints directly into the learning objective, the model avoids the approximation errors and instability often seen in two-stage approaches. This work is particularly relevant for applications such as smart speakers, hearing aids, teleconferencing systems, and autonomous voice interfaces that must operate in crowded acoustic scenes. The paper is available on arXiv (2605.21141) and represents a significant step toward data-driven, real-time multi-speaker audio processing.
- LC-DBF uses a DNN trained with an augmented Lagrangian loss to enforce linear spatial constraints for target speaker enhancement.
- The model achieves superior overall performance, more controlled sidelobes, and better noise attenuation versus classical LCMV.
- Guidance from target RTF and interference subspace estimates enables direct weight estimation from noisy multichannel inputs.
Why It Matters
This deep beamformer could vastly improve voice pickup in crowded rooms for hearing aids and smart devices.