[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]
Fine-tuned Resemble AI's Chatterbox to support 500M+ speakers with just 1.4% of parameters trained.
An independent researcher has successfully extended Resemble AI's open-source Chatterbox-Multilingual text-to-speech (TTS) model to support eight major Indian languages. The project, called chatterbox-indic-lora, adds Telugu, Kannada, Bengali, Tamil, Malayalam, Marathi, Gujarati, and Hindi to the existing 23-language model. This addresses a significant gap, as the original model lacked Dravidian languages and had limited Indo-Aryan coverage beyond Hindi, leaving over 500 million speakers without representation.
The technical approach cleverly avoided the conventional, labor-intensive method of building grapheme-to-phoneme (G2P) systems for each language. Instead, the researcher extended the model's BPE tokenizer with Indic script characters and used LoRA (Low-Rank Adaptation) adapters on the T3 backbone—a Llama-based module. This meant training only 7.8 million parameters (a rank-32 adapter on q/k/v/o projections), which is just 1.4% of the model's total 544 million parameters. Key techniques included initializing new character embeddings from phonetically equivalent Devanagari characters (a 'Brahmic warm-start') and training languages incrementally to prevent catastrophic forgetting.
Results show the method is largely effective. Character Error Rate (CER) evaluations using Whisper large-v3 ASR on held-out samples range from 0.1058 for Hindi (which improved from its baseline) to 0.2853 for Telugu. Malayalam, however, struggles with a CER of 0.8593, indicating it needs more data or dedicated tuning. The model currently supports two speaker voices per language from the IndicTTS dataset and does not yet handle code-mixing (e.g., Hindi+English sentences).
The work demonstrates a highly parameter-efficient path to expanding multilingual AI speech systems. By open-sourcing the model and a detailed write-up, the researcher provides a valuable blueprint for adding low-resource languages to large TTS models without full retraining. This approach could significantly accelerate the development of inclusive voice AI for diverse linguistic communities worldwide.
- Added 8 Indian languages to Chatterbox TTS by training only 7.8M parameters (1.4% of 544M total) using LoRA adapters.
- Used a 'Brahmic warm-start' technique, initializing new script character embeddings from phonetically equivalent Devanagari characters.
- Achieved intelligible speech for 7 languages, with Hindi CER improving to 0.1058, though Malayalam (CER 0.8593) needs more data.
Why It Matters
Provides a scalable blueprint for adding low-resource languages to AI speech models, making voice tech accessible to 500M+ more speakers.