Audio & Speech

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools for Romanian

A new study reveals major setup and efficiency challenges for open-source TTS in low-resource languages.

Deep Dive

A team of researchers from the University Politehnica of Bucharest, including Teodora Răgman, Adrian Bogdan Stânea, Horia Cucu, and Adriana Stan, has published a comprehensive evaluation titled "How Open is Open TTS?" in IEEE Access. The study provides a practical, multi-dimensional assessment of four leading open-source text-to-speech architectures—FastPitch, VITS, Grad-TTS, and Matcha-TTS—specifically for synthesizing Romanian speech. The analysis goes beyond mere output quality, critically examining the often-overlooked practical barriers to adoption, such as the complexity of installation, dataset preparation pipelines, and substantial hardware requirements. These factors are crucial for developers and researchers working with under-resourced languages where computational budgets and technical expertise may be limited.

The evaluation employed both objective metrics and subjective listening tests to measure the intelligibility, speaker similarity, and naturalness of the generated Romanian speech. While the paper details the performance nuances of each model, its core finding is a significant warning: the promise of "open" TTS is hampered by steep practical challenges in the toolchain. Issues with data preprocessing and computational inefficiency create substantial friction, potentially excluding smaller teams or communities. To combat this, the authors have made all code and data publicly available in a Git repository, grounding their analysis in fully reproducible protocols. This work aims to establish clearer best practices and lower the barrier to entry, ultimately promoting more equitable and linguistically diverse innovation in the speech synthesis field.

Key Points
  • The study evaluated four architectures: FastPitch, VITS, Grad-TTS, and Matcha-TTS for Romanian TTS, assessing both qualitative setup and quantitative output quality.
  • Found major practical hurdles including difficult installation, complex data preprocessing, and high computational demands that hinder use in low-resource contexts.
  • Provides fully reproducible code and data to establish best practices for more inclusive, language-diverse TTS development beyond English.

Why It Matters

Highlights the real-world barriers to using cutting-edge AI for global languages, guiding more equitable tool development.