Supports 31 languages — adds 11 new including Cantonese, Hindi, Thai, and Swahili?

Supports 31 languages — adds 11 new including Cantonese, Hindi, Thai, and Swahili

Explicit pause control with inline markers like "[pause 3.2s]" for timed breaks?

Explicit pause control with inline markers like "[pause 3.2s]" for timed breaks

More stable voice cloning with reduced variance and improved speaker similarity?

More stable voice cloning with reduced variance and improved speaker similarity

Open Source

MOSS-TTS v1.5 adds 11 languages and explicit pause control

r/LocalLLaMA May 26, 2026

⚡Now supports 31 languages with language tags and inline pause markers like '[pause 3.2s]'

Deep Dive

OpenMOSS has launched MOSS-TTS-v1.5, an upgraded version of their open-source text-to-speech model. The update focuses on three core improvements: expanded multilingual support, more reliable voice cloning, and fine-grained prosody control. The model now supports 31 languages, adding Cantonese, Dutch, Finnish, Hindi, Macedonian, Malay, Romanian, Swahili, Tagalog, Thai, and Vietnamese to the original 20. When the language field is omitted, performance varies by language, but when explicitly tagged (e.g., language="French"), v1.5 outperforms v1.0 on nearly all languages.

Voice cloning in v1.5 is more stable—speaker similarity is improved and repeated generations show less variance. The model also handles long reference audio with short target text better than before. A standout addition is explicit pause control: users can insert inline markers like "[pause 3.2s]" to force timed silences (e.g., before a dramatic word). Punctuation-driven prosody is also more consistent, especially in long sentences. The team also released MOSS-SoundEffect-v2.0 for parallel sound effects generation.

Key Points

Supports 31 languages — adds 11 new including Cantonese, Hindi, Thai, and Swahili
Explicit pause control with inline markers like "[pause 3.2s]" for timed breaks
More stable voice cloning with reduced variance and improved speaker similarity

Why It Matters

Creators and developers now get multilingual, controllable TTS with precise timing for professional audio production.

Read Original Article

MOSS-TTS v1.5 adds 11 languages and explicit pause control

Why It Matters

Related Articles

🚀 Stay Ahead in AI