Open Source

[RELEASE] - Finally, my first TTS model is out! 🎙️ Flare-TTS 28M

A single A6000 GPU, 300 epochs, and 28M parameters deliver free speech synthesis.

Deep Dive

Flare-TTS 28M is a text-to-speech model built entirely from scratch by independent developer LH-Tech_AI. It was trained on the full LJSpeech dataset (24 hours, 300 epochs) using just one NVIDIA A6000 GPU—a fraction of the compute typically required for modern TTS systems. The model has 28 million parameters, making it small enough to run on consumer hardware. The developer provides an example audio file on Hugging Face, noting the output still has a robotic quality but is functional for basic applications.

The model is released under an open-source license, allowing anyone to download, modify, or deploy it for free. While not matching the naturalness of larger commercial systems (e.g., ElevenLabs or OpenAI’s TTS), its low training cost and accessibility make it a valuable resource for hobbyists, researchers, or developers seeking offline speech synthesis. The project highlights how GPU-efficient training can democratize AI voice generation, even with limited resources.

Key Points
  • Trained from scratch on a single NVIDIA A6000 GPU in ~24 hours (300 epochs) using the LJSpeech dataset.
  • 28 million parameter model, open-source and free on Hugging Face for anyone to use or modify.
  • Produces English speech that is intelligible but still sounds somewhat robotic; suited for lightweight TTS needs.

Why It Matters

Demonstrates that high-quality TTS models can be built cheaply and democratized, reducing reliance on big cloud APIs.