Image & Video

Open-source KaniTTS2 clones voices in real-time, runs on just 3GB VRAM

r/StableDiffusion February 15, 2026

⚡This open-source voice AI could make high-quality TTS accessible to everyone.

Deep Dive

The open-source KaniTTS2 model has been released, featuring real-time voice cloning and multilingual support. The 400M parameter model runs on just 3GB of VRAM with a 0.2 real-time factor on modern GPUs. Crucially, the team is releasing the complete pretraining framework, allowing anyone to train custom TTS models for specific languages or accents. The model was pretrained on 10k hours of speech and trained in just 6 hours on 8x H100s.

Why It Matters

Democratizes professional-grade voice synthesis, enabling custom models for any language or accent at minimal cost.

Read Original Article

Open-source KaniTTS2 clones voices in real-time, runs on just 3GB VRAM

Why It Matters

Related Articles

🚀 Stay Ahead in AI