Audio & Speech

VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling

New dataset reveals speaker verification error rates double over 15 years as voices age, challenging current AI models.

Deep Dive

A research team led by Yanir Marmor from the Weizmann Institute of Science and Reichman University has introduced VoxKnesset, a groundbreaking longitudinal speech dataset addressing a fundamental challenge in voice AI: how human voices change with age. The dataset contains approximately 2,300 hours of Hebrew parliamentary speech spanning 2009-2025, featuring 393 speakers with recording spans up to 15 years—making it one of the largest longitudinal speech collections available. Each audio segment includes aligned transcripts and verified demographic metadata from official Israeli Knesset records, providing researchers with unprecedented temporal depth for studying vocal aging patterns.

The team benchmarked modern speech embeddings including WavLM-Large, ECAPA-TDNN, and Wav2Vec2-XLSR-1B on age prediction and speaker verification tasks, revealing critical insights about voice aging. Speaker verification Equal Error Rate (EER) increased from 2.15% to 4.58% over 15 years for the strongest model, demonstrating how aging degrades biometric accuracy. Crucially, cross-sectionally trained age regressors failed to capture within-speaker aging patterns, while models trained longitudinally successfully recovered meaningful temporal signals. The researchers are publicly releasing both the dataset and processing pipeline to support development of aging-robust speech systems and advance Hebrew speech processing research, which has historically lacked large-scale resources.

Key Points
  • Contains 2,300 hours of Hebrew parliamentary speech from 393 speakers tracked for up to 15 years (2009-2025)
  • Speaker verification error rates double from 2.15% to 4.58% over 15-year spans, showing aging's impact on voice biometrics
  • Openly released dataset and pipeline will enable development of aging-robust speech AI systems and advance Hebrew NLP

Why It Matters

Enables development of voice AI that remains accurate as users age, crucial for authentication systems and assistive technologies.