SEABAD dataset boosts tropical bird detection with 99.57% accuracy
50,000 curated clips from 1,677 Southeast Asian bird species...
Passive acoustic monitoring (PAM) generates huge volumes of audio, but most of it is non-informative. Bird audio detection (BAD) can filter out irrelevant recordings, but existing BAD systems are trained on temperate datasets and struggle with the dense, species-rich soundscapes of the tropics. To close this gap, researchers from the University of Malaya created SEABAD (Southeast Asian Bird Activity Detection), a dataset of 50,000 three-second clips balanced evenly between bird-present and bird-absent samples. The dataset spans 1,677 bird species and is standardized to 16 kHz mono audio for efficient inference on low-power edge devices. A dual-branch curation pipeline reduced class imbalance by 13.7% (Gini coefficient from 0.601 to 0.519), and manual auditing of 1,000 positive clips confirmed 97.8% ± 0.9% labeling accuracy.
Baseline experiments with MobileNetV3-Small — a lightweight model suitable for edge deployment — achieved 99.57% ± 0.25% accuracy and 0.9985 ± 0.0002 AUC across three random seeds. The dataset and full curation pipeline are publicly released to support tropical BAD research and enable energy-efficient acoustic monitoring in biodiversity hotspots. This addresses a critical gap: most acoustic monitoring technology is built for temperate regions, yet tropical ecosystems hold the majority of the world's avian biodiversity and face the greatest conservation pressures. By providing a high-quality, low-power-optimized dataset, SEABAD paves the way for scalable, real-time biodiversity assessment in the tropics.
- SEABAD contains 50,000 three-second audio clips balanced 50/50 between bird-present and bird-absent samples
- Dataset covers 1,677 Southeast Asian bird species, standardized to 16 kHz mono for edge deployment
- MobileNetV3-Small baseline achieves 99.57% accuracy and 0.9985 AUC; labeling accuracy is 97.8%
Why It Matters
Enables low-power acoustic monitoring in tropical forests, where most biodiversity exists but detection tools are lacking.