Discrete optimal transport is a strong audio adversarial attack
A new post-processing method bypasses modern anti-spoofing systems by aligning AI voice embeddings with real speech distributions.
Researchers Anton Selitskiy, Akib Shahriyar, and Jishnuraj Prakasan developed kDOT-VC, a discrete optimal transport voice conversion method. It acts as a black-box adversarial attack by aligning frame-level WavLM embeddings of synthetic speech with a pool of real speech via entropic optimal transport and a top-k barycentric projection, then decoding with a neural vocoder. The method demonstrates stronger domain adaptation than kNN-VC, SinkVC, and Gaussian OT, effectively fooling deployed countermeasures.
Why It Matters
This exposes critical vulnerabilities in voice authentication and deepfake detection systems used for security and fraud prevention.