Image & Video

Open-source KaniTTS2 clones voices in real-time, runs on just 3GB VRAM

This open-source voice AI could make high-quality TTS accessible to everyone.

Deep Dive

The open-source KaniTTS2 model has been released, featuring real-time voice cloning and multilingual support. The 400M parameter model runs on just 3GB of VRAM with a 0.2 real-time factor on modern GPUs. Crucially, the team is releasing the complete pretraining framework, allowing anyone to train custom TTS models for specific languages or accents. The model was pretrained on 10k hours of speech and trained in just 6 hours on 8x H100s.

Why It Matters

Democratizes professional-grade voice synthesis, enabling custom models for any language or accent at minimal cost.

📬 Get the top 10 AI stories daily