Open Source

MOSS-TTS v1.5 adds 11 languages and explicit pause control

Now supports 31 languages with language tags and inline pause markers like '[pause 3.2s]'

Deep Dive

OpenMOSS has launched MOSS-TTS-v1.5, an upgraded version of their open-source text-to-speech model. The update focuses on three core improvements: expanded multilingual support, more reliable voice cloning, and fine-grained prosody control. The model now supports 31 languages, adding Cantonese, Dutch, Finnish, Hindi, Macedonian, Malay, Romanian, Swahili, Tagalog, Thai, and Vietnamese to the original 20. When the language field is omitted, performance varies by language, but when explicitly tagged (e.g., language="French"), v1.5 outperforms v1.0 on nearly all languages.

Voice cloning in v1.5 is more stable—speaker similarity is improved and repeated generations show less variance. The model also handles long reference audio with short target text better than before. A standout addition is explicit pause control: users can insert inline markers like "[pause 3.2s]" to force timed silences (e.g., before a dramatic word). Punctuation-driven prosody is also more consistent, especially in long sentences. The team also released MOSS-SoundEffect-v2.0 for parallel sound effects generation.

Key Points
  • Supports 31 languages — adds 11 new including Cantonese, Hindi, Thai, and Swahili
  • Explicit pause control with inline markers like "[pause 3.2s]" for timed breaks
  • More stable voice cloning with reduced variance and improved speaker similarity

Why It Matters

Creators and developers now get multilingual, controllable TTS with precise timing for professional audio production.