5.3K hours of expressive audiobook speech with pseudo-labels for delivery style

Fine-tuned flow-matching TTS models showed up to 40% gains in expressivity and intelligibility on LibriQuote-test?

Fine-tuned flow-matching TTS models showed up to 40% gains in expressivity and intelligibility on LibriQuote-test

All data, code, and benchmarks are publicly available on Hugging Face and Replicate?

All data, code, and benchmarks are publicly available on Hugging Face and Replicate

Audio & Speech

New AI dataset LibriQuote makes TTS 5.3K hours more expressive

arXiv eess.AS April 22, 2026

⚡LibriQuote’s 5.3K-hour expressive speech dataset boosts TTS expressiveness by up to 40% in benchmark tests.

Deep Dive

A team of researchers from France (Gaspard Michel, Elena V. Epure, and Christophe Cerisara) has released LibriQuote, a first-of-its-kind dataset designed to supercharge expressive text-to-speech systems. The dataset consists of 5.3K hours of human-narrated audiobook speech, specifically curated to capture the rich prosodic variations that occur when narrators shift between neutral storytelling and emotive character dialogue. Each quote is annotated with contextual pseudo-labels describing delivery style (e.g., “whispered softly”), enabling models to learn more nuanced speech synthesis.

In benchmarking on LibriQuote-test, fine-tuning a flow-matching TTS model on this data yielded substantial improvements in both expressivity and intelligibility. Training an autoregressive TTS model from scratch on LibriQuote also significantly enhanced its ability to generate expressive speech. The team has open-sourced the dataset, code, and evaluation tools to accelerate research and reproducibility. Audio samples and project links are available on Hugging Face Spaces and Replicate, making it easy for developers to test and iterate.

Key Points

LibriQuote: 5.3K hours of expressive audiobook speech with pseudo-labels for delivery style
Fine-tuned flow-matching TTS models showed up to 40% gains in expressivity and intelligibility on LibriQuote-test
All data, code, and benchmarks are publicly available on Hugging Face and Replicate

Why It Matters

LibriQuote unlocks human-level expressivity in AI voices, transforming podcasts, audiobooks, and virtual assistants with natural emotional depth.

Read Original Article

New AI dataset LibriQuote makes TTS 5.3K hours more expressive

Why It Matters

Related Articles

🚀 Stay Ahead in AI