Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification
Beyond 8 kHz baseband: new framework captures ultrasonic animal sounds missed by most AI models.
A multi-band encoding framework for bioacoustics classification decomposes the full spectrum of animal calls into band features and fuses them, addressing that most AI models pre-trained at 16 kHz discard frequencies above 8 kHz. Classification experiments on three datasets using eight pre-trained models and five fusion strategies show that fused representations outperform baseband and time-expansion baselines on two datasets, demonstrating potential for full-spectrum encoding.
- Most bioacoustic AI models are limited to 0–8 kHz baseband, missing ultrasonic sounds up to 96 kHz used by bats and rodents.
- The framework was tested with 8 pre-trained models and 5 fusion strategies across 3 datasets, outperforming baselines on 2 of them.
- Decorrelation analysis shows certain encoders produce band embeddings that improve class separation when fused.
Why It Matters
Enables AI-driven wildlife monitoring to capture full-spectrum animal communication, improving conservation and behavioral research.