MiniMax Unveils Speech 2.8, Enhancing AI Voice with Nuance and High-Fidelity Cloning
New AI voice model clones your vocal fingerprint in 10 seconds with studio clarity.
MiniMax has launched Speech 2.8, a significant upgrade to its AI voice synthesis model, aiming to close the gap between synthetic and human speech. The core innovation is the introduction of 'Native Sound Tags'—modeling colloquial fillers like 'um,' 'uh,' and 'ah' along with breaths and pauses. This gives the AI a 'lived-in' quality, making it sound spontaneous and warm rather than robotic. The model captures the natural rhythm, pitch, and hesitations that convey emotion and emphasis, as demonstrated in a demo where the AI narrates sounding like a casual human speaker with chuckles and clear-throat sounds.
Speech 2.8 also excels in voice cloning: with just a 10-second audio sample, it replicates a person's unique texture, breathiness, and speaking pace—what MiniMax calls the 'vocal fingerprint.' The output is studio-grade, thanks to a re-engineered processing engine that eliminates background noise and digital artifacts. Additionally, cross-lingual performance is improved for the Mandarin-Japanese pair, eliminating 'accent bleed' for native-sounding speech across languages. The model is available now via the MiniMax Open Platform and MiniMax Audio products.
- Native Sound Tags model fillers like 'um' and 'ah' for natural rhythm and emotional nuance.
- Voice cloning from just 10 seconds of audio captures unique texture, breathiness, and pace.
- Studio-grade noise elimination and improved cross-lingual quality for Mandarin-Japanese.
Why It Matters
Makes AI voice indistinguishable from human speech, enabling more natural voice assistants, dubbing, and content creation.