Image & Video

KaniTTS2 - open-source 400M TTS model with voice cloning, runs in 3GB VRAM. Pretrain code included.

This open-source voice AI could make high-quality TTS accessible to everyone.

Deep Dive

The open-source KaniTTS2 model has been released, featuring real-time voice cloning and multilingual support. The 400M parameter model runs on just 3GB of VRAM with a 0.2 real-time factor on modern GPUs. Crucially, the team is releasing the complete pretraining framework, allowing anyone to train custom TTS models for specific languages or accents. The model was pretrained on 10k hours of speech and trained in just 6 hours on 8x H100s.

Why It Matters

Democratizes professional-grade voice synthesis, enabling custom models for any language or accent at minimal cost.