BioSEN: New AI Model Boosts Animal Vocalization Clarity for Conservation
Adapting speech enhancement to decode noisy animal calls with far less computation.
Most audio enhancement models are designed for human speech, leaving bioacoustic signals—animal calls, bird songs, whale clicks—poorly served due to noisy field recordings and the unique harmonic structures of non-human sounds. To address this gap, an international team (Tianyu Song, Ton Viet Ta, Ngamta Thamwattana, Hisako Nomura, and Linh Thi Hoai Nguyen) built BioSEN, a dedicated model for cleaning up animal vocalizations.
BioSEN incorporates three key modules: a multi-scale dual-axis attention unit that extracts time-frequency features across multiple resolutions; a bio-harmonic multi-scale enhancement unit specifically designed to capture the harmonic patterns common in animal sounds; and an energy-adaptive gating connection unit that uses frequency-dependent weights to prevent vocalizations from being mistakenly removed as background noise. This architecture adapts proven speech enhancement techniques to the distinct characteristics of bioacoustics.
In tests across three bioacoustic datasets (covering various species and recording conditions), BioSEN matched or exceeded the performance of state-of-the-art speech enhancement models—while requiring far less computational power. The efficiency gain is critical for deployment on edge devices in remote field settings.
The paper, accepted at ICASSP 2026, positions BioSEN as a practical tool for wildlife monitoring, biodiversity assessment, and conservation. By making noisy field recordings intelligible, it enables researchers to automatically identify species, track population changes, and detect rare calls that would otherwise be lost in background noise.
- BioSEN uses three specialized modules: multi-scale dual-axis attention, bio-harmonic enhancement, and energy-adaptive gating.
- Matches or exceeds state-of-the-art speech models on three bioacoustic datasets while using significantly less computation.
- Enables clearer analysis of animal vocalizations from noisy field recordings for biodiversity monitoring and conservation.
Why It Matters
Better animal call detection from noisy recordings means more accurate wildlife monitoring and conservation at lower compute cost.