Audio & Speech

Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

Beyond 8 kHz baseband: new framework captures ultrasonic animal sounds missed by most AI models.

Deep Dive

A multi-band encoding framework for bioacoustics classification decomposes the full spectrum of animal calls into band features and fuses them, addressing that most AI models pre-trained at 16 kHz discard frequencies above 8 kHz. Classification experiments on three datasets using eight pre-trained models and five fusion strategies show that fused representations outperform baseband and time-expansion baselines on two datasets, demonstrating potential for full-spectrum encoding.

Key Points
  • Most bioacoustic AI models are limited to 0–8 kHz baseband, missing ultrasonic sounds up to 96 kHz used by bats and rodents.
  • The framework was tested with 8 pre-trained models and 5 fusion strategies across 3 datasets, outperforming baselines on 2 of them.
  • Decorrelation analysis shows certain encoders produce band embeddings that improve class separation when fused.

Why It Matters

Enables AI-driven wildlife monitoring to capture full-spectrum animal communication, improving conservation and behavioral research.