Audio & Speech

Accent features extracted without labels outperform SSL embeddings in Brazilian Portuguese

A novel acoustic-only method beats self-supervised models on dialect classification.

Deep Dive

A team led by Pedro H. L. Leite at the Federal University of Rio de Janeiro (UFRJ) has developed a novel method to extract regional accent features in Brazilian Portuguese without relying on sociolinguistic labels. Instead of using large self-supervised learning (SSL) speech models—which dilute sociophonetic information due to unreliable or absent accent labels—the researchers leverage ZIPA, a phoneme-based forced aligner, to isolate explicit regional accent landmarks. This acoustic-only approach enables the extraction of a targeted feature set that captures dialectal variance more effectively than utterance embeddings from general-purpose SSL architectures.

The study, submitted to the XLIV Brazilian Symposium on Telecommunications and Signal Processing (SBrT 2026), demonstrates that localized features can outperform broader, less-constrained models on accent-related tasks. By using only acoustic labels (e.g., phoneme boundaries and durations), the workflow requires minimal and objective data, reducing the need for expensive manual labeling. This has significant implications for speech processing in Brazilian Portuguese, enabling more scalable and accurate dialect identification, accent adaptation in ASR systems, and sociolinguistic research—all without the typical data bottleneck of sociolinguistic annotations.

Key Points
  • Method uses ZIPA forced aligner to extract accent landmarks from acoustic features only, no sociolinguistic labels required.
  • Targeted feature set outperforms utterance embeddings from large SSL models on accent classification in Brazilian Portuguese.
  • Workflow was submitted to SBrT 2026 and relies solely on phoneme-level acoustic data, reducing labeling costs.

Why It Matters

Enables scalable accent recognition without expensive labeling, benefiting Brazilian Portuguese ASR and sociophonetics.