BERT-based ASV achieved 35% mean EER on VoicePrivacy datasets, with some speakers at 2% EER using only text?

BERT-based ASV achieved 35% mean EER on VoicePrivacy datasets, with some speakers at 2% EER using only text

Attack exploits intra-speaker linguistic similarity—consistent vocabulary and phrasing patterns—not acoustic features?

Attack exploits intra-speaker linguistic similarity—consistent vocabulary and phrasing patterns—not acoustic features

Study reveals dataset bias in LibriSpeech and recommends reworking evaluation benchmarks for voice privacy?

Study reveals dataset bias in LibriSpeech and recommends reworking evaluation benchmarks for voice privacy

Audio & Speech

New voice privacy attack uses BERT to identify speakers from text alone

arXiv eess.AS May 21, 2026

⚡BERT-based ASV achieves 35% EER and as low as 2% on some speakers using only linguistic content

Deep Dive

A team led by Ünal Ege Gaznepoglu from FAU Erlangen-Nuremberg and partners at Fraunhofer IIS, Saarland University, and others has demonstrated a novel voice privacy attack that leverages linguistic content rather than acoustic features. Their method, presented at INTERSPEECH 2025, adapts BERT—a language model—as an automatic speaker verification (ASV) system. On standard VoicePrivacy datasets, the attack achieved a mean equal error rate of 35%, with certain speakers identified with EERs as low as 2% based purely on what they said, not how they said it.

The attack's success stems from intra-speaker linguistic similarity: individuals tend to use consistent vocabulary, phrasing, and semantic patterns. The researchers' explainability study linked model decisions to semantically similar keywords across utterances, a bias introduced by how the LibriSpeech dataset is curated. This finding exposes a critical flaw in current speaker anonymization evaluations, which assume that removing voice biometrics (pitch, timbre, etc.) guarantees privacy. The authors call for reworking the VoicePrivacy datasets to ensure fair and unbiased evaluation and caution against relying solely on global EER metrics for privacy assessments.

Key Points

BERT-based ASV achieved 35% mean EER on VoicePrivacy datasets, with some speakers at 2% EER using only text
Attack exploits intra-speaker linguistic similarity—consistent vocabulary and phrasing patterns—not acoustic features
Study reveals dataset bias in LibriSpeech and recommends reworking evaluation benchmarks for voice privacy

Why It Matters

Voice anonymization systems may fail if attackers can profile speakers by what they say, not just how they sound.

Read Original Article

New voice privacy attack uses BERT to identify speakers from text alone

Why It Matters

Related Articles

🚀 Stay Ahead in AI