SHroom: A Python Framework for Ambisonics Room Acoustics Simulation and Binaural Rendering
New open-source framework achieves perceptual transparency for VR/AR audio while being 7x faster for multi-source scenes.
Researcher Yhonatan Gayer has released SHroom, an open-source Python framework that significantly advances the simulation of 3D audio for virtual and augmented reality. The library uses an Ambisonics approach, projecting sound sources from a simulated room onto a Spherical Harmonics (SH) basis. This creates a flexible, composable pipeline for tasks like binaural decoding (creating a realistic stereo headphone experience) and simulating spherical microphone arrays. Crucially, SHroom achieves what's termed "perceptual transparency"—its simulated audio is nearly indistinguishable from a high-fidelity reference. Benchmarked against the established pyroomacoustics library, SHroom with its Magnitude Least Squares (MagLS) decoder scored a 2.02 dB Log Spectral Distance at a low SH order of N=5, which falls within the 1-2 dB range considered the Just Noticeable Difference for humans.
Beyond accuracy, SHroom's architecture is built for performance, especially in dynamic, multi-source environments common in VR. Its "fixed-once decode" feature amortizes computational cost across multiple sound sources. When scaling from 1 to 8 concurrent sources, the performance slowdown narrows from 7x to just 3.1x compared to a naive approach. For real-time applications, its most critical innovation is handling dynamic head rotation with a Wigner-D matrix multiplication that takes less than 1 millisecond per audio frame. This makes SHroom, according to the paper, the only architecturally viable choice for real-time, interactive 3D audio simulation, unlocking new fidelity for immersive experiences and acoustic research.
- Achieves perceptual transparency with a 2.02 dB Log Spectral Distance score, within the human Just Noticeable Difference range.
- Enables real-time dynamic head rotation for VR/AR with computational latency of under 1ms per audio frame.
- Offers scalable performance; slowdown for processing 8 sources narrows to 3.1x vs. 7x for a single source compared to a reference.
Why It Matters
Provides audio engineers and XR developers with a high-performance, open-source tool to create convincingly realistic 3D soundscapes for immersive applications.