Open Source

KaniTTS2: New 400M-parameter model achieves real-time speech with 0.2 RTF

r/LocalLLaMA February 14, 2026

⚡This open-source TTS model trains in just 6 hours and clones voices instantly.

Deep Dive

KaniTTS2 is a new 400M-parameter text-to-speech model optimized for real-time conversational AI. It achieves a 0.2 Real-Time Factor on an RTX 5080 using just 3GB VRAM, making it fast enough for live applications. The model supports voice cloning and was pretrained on 10k hours of speech data in only 6 hours using 8x H100 GPUs. It's multilingual (English, Spanish, Kyrgyz) and the full pretraining code is released under Apache 2.0.

Why It Matters

This dramatically lowers the barrier for creating custom, real-time voice AI in any language or accent.

Read Original Article

KaniTTS2: New 400M-parameter model achieves real-time speech with 0.2 RTF

Why It Matters

Related Articles

🚀 Stay Ahead in AI