Open Source

Open-source KaniTTS2 clones voices in real-time using just 3GB VRAM

This 400M-parameter model could democratize voice AI for any language.

Deep Dive

The open-source KaniTTS2 text-to-speech model has been released, featuring real-time voice cloning and multilingual support. The 400M-parameter model runs on just 3GB of VRAM with a 0.2 real-time factor on high-end GPUs, and was trained on 10k hours of speech data in just 6 hours using 8x H100 GPUs. Critically, the team is releasing the complete pretraining code, allowing anyone to train custom TTS models for specific languages or accents.

Why It Matters

It enables developers and communities to create high-quality, localized voice AI without massive computational resources or proprietary platforms.

📬 Get the top 10 AI stories daily