Kokoro TTS, but it clones voices now — Introducing KokoClone
The open-source model clones any voice from a 3-10 second clip and runs in real-time on CPU.
Developer Ashish Patnaik has launched KokoClone, a significant open-source upgrade to the popular Kokoro text-to-speech (TTS) engine. The new model, released under an Apache 2.0 license, adds zero-shot voice cloning capabilities, meaning it can mimic a speaker's vocal timbre from a single, short audio sample. This bridges a major gap for users who appreciated Kokoro's prosody and multilingual support but wanted personalized voice output. The tool is immediately accessible via a Hugging Face demo, with full source code and weights available on GitHub.
Technically, KokoClone uses a two-step system: the core Kokoro-TTS engine handles pronunciation, pacing, and emotional inflection across eight languages, while a separate cloning layer transfers the acoustic signature from the user's reference audio. Because it's built on Kokoro's existing ONNX runtime stack, it maintains the original engine's hallmark speed and efficiency, capable of real-time synthesis even on consumer CPU hardware. The release provides a clean Gradio web interface, CLI, and a simple Python API for integration, positioning it as a powerful, accessible alternative to closed-source voice cloning services. Its open-source nature invites community development and could accelerate innovation in multilingual, real-time voice AI applications.
- Adds zero-shot voice cloning to Kokoro TTS using just a 3-10 second .wav reference clip
- Runs in real-time on CPU, retains Kokoro's multilingual support for 8 languages including English, Hindi, and Japanese
- Fully open-source (Apache 2.0) with live demo, CLI, and simple Python API for immediate integration
Why It Matters
Democratizes professional-grade voice cloning for developers, offering a fast, open-source alternative to proprietary APIs for real-time applications.