Audio & Speech

Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

New AI system segments acoustic energy maps to locate drones with improved angular precision, even in noisy environments.

Deep Dive

A research team has developed a new AI-powered acoustic imaging system that can detect and locate drones (UAVs) with high precision, even in low signal-to-noise ratio (SNR) conditions. The core innovation is treating 360° sound source localization as a spherical semantic segmentation task. Instead of traditional methods that regress discrete direction-of-arrival (DoA) angles, their model uses a modified U-Net architecture to segment beamformed audio maps into regions of active sound presence. This approach is trained on data from a custom 24-microphone array, with signals aligned to drone GPS telemetry to create accurate supervision masks.

Key to the system's performance is its use of delay-and-sum (DAS) beamforming to generate frequency-domain energy maps, which the U-Net learns to interpret. The model addresses class imbalance—where silent areas vastly outnumber active sound regions—using the Tversky loss function. A significant advantage is the method's inherent array-independence; because the network operates on beamformed maps, it can adapt to different microphone configurations without requiring complete retraining from scratch. Final DoA estimates are derived by computing centroids over the model's activated segmentation regions.

The researchers validated their system with a dataset of real-world, open-field recordings of a DJI Air 3 drone, synchronized with 360° video and flight logs. Experimental results demonstrate that the U-Net model generalizes effectively across different environments, offering improved angular precision over traditional Sound Source Localization (SSL) techniques. This work establishes a new paradigm for dense spatial audio understanding, moving beyond point estimates to provide a richer, segmented acoustic image of a scene.

Key Points
  • Uses a modified U-Net model for spherical semantic segmentation of acoustic energy maps, rather than regressing discrete angles.
  • Trained on data from a custom 24-microphone array using delay-and-sum beamforming and synchronized DJI Air 3 drone GPS logs.
  • The array-independent approach generalizes across environments and offers improved angular precision for robust drone detection in low-SNR conditions.

Why It Matters

Enables more reliable, passive detection of drones for security and monitoring applications, especially in acoustically challenging environments.