Wavelet Scattering Transform Neural Field Reconstructs Sound from Sparse Data
New AI technique upsamples HRTFs using multi-scale statistical priors from just 5 pages.
A team of researchers (Xinmeng Luan, Samuel A. Verburg, Efren Fernandez-Grande, Gary Scavone) from McGill University has introduced a novel framework for sound field reconstruction that leverages the Wavelet Scattering Transform (WST) as a multi-scale feature extractor. The method formulates reconstruction as an optimization problem solved by a neural field, with WST coefficients incorporated into the training loss to impose statistical priors under sparse observation conditions. As a proof of concept, they validate the approach on Head-Related Transfer Function (HRTF) upsampling—a critical task for personalized spatial audio in VR and hearing aids.
A key innovation is a two-phase masking strategy. In phase one, a binary mask is learned from a small multi-subject dataset. In phase two, this mask is applied to the WST coefficients of an individual HRTF, preserving only the most informative statistical structures during reconstruction. Validation against baseline methods, which also serve as an ablation study, demonstrates the framework's effectiveness. The paper (5 pages, 2 figures) was submitted to arXiv on June 3, 2026, and is categorized under Audio and Speech Processing, Sound, and Signal Processing.
- Uses Wavelet Scattering Transform (WST) as a multi-scale feature extractor to impose statistical priors from sparse observations.
- Employs a two-phase masking strategy: first learns a binary mask from multi-subject data, then applies it to individual HRTF coefficients.
- Validated on HRTF upsampling, outperforming baseline methods in a 5-page conference paper with 2 figures.
Why It Matters
Could enable high-quality spatial audio from fewer microphones, improving VR, hearing aids, and acoustic simulation.