Open-source KaniTTS2 clones voices in real-time, runs on just 3GB VRAM
This open-source voice AI could make high-quality TTS accessible to everyone.
The open-source KaniTTS2 model has been released, featuring real-time voice cloning and multilingual support. The 400M parameter model runs on just 3GB of VRAM with a 0.2 real-time factor on modern GPUs. Crucially, the team is releasing the complete pretraining framework, allowing anyone to train custom TTS models for specific languages or accents. The model was pretrained on 10k hours of speech and trained in just 6 hours on 8x H100s.
Why It Matters
Democratizes professional-grade voice synthesis, enabling custom models for any language or accent at minimal cost.