Integrated Gradients achieves 0.39 IoU, 0.52 F1, and 82.6% Pointing Game accuracy on a 10-class domestic sound dataset?

Integrated Gradients achieves 0.39 IoU, 0.52 F1, and 82.6% Pointing Game accuracy on a 10-class domestic sound dataset.

Performance nearly matches weakly-supervised (0.42 IoU, 0.55 F1) and strongly-supervised (0.45 IoU, 0.58 F1) CNN baselines?

Performance nearly matches weakly-supervised (0.42 IoU, 0.55 F1) and strongly-supervised (0.45 IoU, 0.58 F1) CNN baselines.

All methods significantly outperform random and energy-based baselines, validating IG as a post-hoc temporal localization tool?

All methods significantly outperform random and energy-based baselines, validating IG as a post-hoc temporal localization tool.

Audio & Speech

Integrated Gradients detects sound events with 82.6% accuracy, rivaling supervised models

arXiv eess.AS May 25, 2026

⚡Can post-hoc attribution methods localize sounds as well as trained detectors?

Deep Dive

The authors test Integrated Gradients (IG) for temporal sound event detection on a 10-class domestic audio dataset. Without any temporal training labels, IG achieves mean IoU of 0.39, frame-level F1 of 0.52, and Pointing Game accuracy of 82.6%. For comparison, a weakly-supervised CNN (clip-level labels) achieves 0.42 IoU, 0.55 F1, and 97.3% PG, while a strongly-supervised CNN (frame-level labels) achieves 0.45 IoU, 0.58 F1, and 97.9% PG. The results suggest that post-hoc IG captures meaningful temporal activity patterns, with localization performance approaching that of models explicitly producing frame-level predictions.

Key Points

Integrated Gradients achieves 0.39 IoU, 0.52 F1, and 82.6% Pointing Game accuracy on a 10-class domestic sound dataset.
Performance nearly matches weakly-supervised (0.42 IoU, 0.55 F1) and strongly-supervised (0.45 IoU, 0.58 F1) CNN baselines.
All methods significantly outperform random and energy-based baselines, validating IG as a post-hoc temporal localization tool.

Why It Matters

Proves that explainable AI can localize audio events without expensive temporal labels, reducing annotation costs.

Read Original Article

Integrated Gradients detects sound events with 82.6% accuracy, rivaling supervised models

Why It Matters

Related Articles

🚀 Stay Ahead in AI