XAttnMark audio watermarking achieves SOTA detection and attribution with cross-attention
New cross-attention watermark beats deepfake audio editing at varying strengths.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
The rapid rise of generative audio synthesis and editing has created urgent needs for robust watermarking to protect copyright and combat deepfake misinformation. Existing neural methods like WavMark and AudioSeal struggle to jointly optimize detection and attribution. XAttnMark bridges this gap with a novel architecture that pairs a generator and detector through cross-attention mechanisms and partial parameter sharing. It also introduces a temporal conditioning module to improve message distribution and a psychoacoustic-aligned time-frequency masking loss that models fine-grained auditory masking for better imperceptibility.
In extensive tests, XAttnMark achieves state-of-the-art performance across both detection and attribution tasks, maintaining robustness against a wide range of audio transformations including challenging generative editing at varying strengths. The work, accepted at ICML 2025, provides a practical solution for verifying audio provenance and intellectual property in the generative AI era.
- Cross-attention mechanism between generator and detector for efficient message retrieval and joint optimization.
- Psychoacoustic-aligned time-frequency masking loss improves imperceptibility by modeling human auditory masking effects.
- Superior robustness against generative audio editing (e.g., style transfer, re-synthesis) at varying strengths.
Why It Matters
Protects audio IP and authenticates content against rising deepfake and generative editing threats.