Adversarial Contrastive Learning Enables Few-Shot Audio Classification Across Domains
Researchers achieve SOTA accuracy on 6 cross-domain audio datasets with domain shift handling.
Current few-shot class-incremental audio classification (FCAC) methods assume base and incremental samples share the same distribution, but real-world applications often involve domain shifts (e.g., different recording environments). To address this, researchers Yongjie Si, Yanxiong Li, Sen Huang, and Beibei Liu introduce an adversarial contrastive learning strategy. Their model consists of a frozen encoder (trained only in the base session) and a classifier that is updated incrementally. By training the encoder to produce domain-invariant features via adversarial contrastive loss, the classifier can effectively identify new classes from unseen domains without catastrophic forgetting.
The method was evaluated on six pairs of cross-domain datasets, achieving higher average accuracy than state-of-the-art approaches. The paper, accepted at Interspeech 2026, details experiments with five pages and three figures. Code is publicly available. This work is a significant step toward practical audio AI systems that can adapt to changing environments, such as smart assistants handling new sounds in different rooms or mobile devices facing varying acoustic conditions.
- Proposes first solution for domain shift in few-shot class-incremental audio classification (FCAC).
- Uses adversarial contrastive training with a frozen encoder to learn domain-invariant features.
- Outperforms existing SOTA methods across six cross-domain dataset pairs in average accuracy.
Why It Matters
Enables audio AI to learn new classes under real-world domain changes without full retraining.