ComfyUI-DramaBox now supports Loras and auto-dataset generation from audio
New TTS node generates custom voices from just 10 short audio clips
The ComfyUI-DramaBox node, originally built by Francky_B around the LTX-based TTS model DramaBox by u/manmaynakhashi, has been updated to support LoRA (Low-Rank Adaptation) weights. This allows users to fine-tune voice characteristics with minimal samples, enabling personalized text-to-speech generation directly in ComfyUI. The update requires users to point the node to the models/dramabox folder, ensuring consistency with the companion tool.
Alongside the node update, Francky_B released Voice-Clone-Studio-DramaBox, a streamlined version of his larger TTS toolkit. It focuses exclusively on DramaBox, keeping only the Qwen-TTS voice design module for generating voice profiles. The standout feature is the Prep Sample tab, which automatically splits a single long audio file into phrase-level clips and transcribes them, producing a ready-to-use training dataset. The developer notes that 10 clips of 5–10 seconds yield better results than 80 longer clips, but warns that DramaBox is still prone to hallucinations, making this an experimental tool for voice cloning enthusiasts.
- ComfyUI-DramaBox now supports LoRA weights for custom voice style transfer.
- Voice-Clone-Studio-DramaBox includes a Prep Sample tab that auto-chops long audio into transcribed clips (best: 10 clips, 5–10 seconds each).
- The tool reuses models from ComfyUI's dramabox folder and is optimized for quick dataset generation, though hallucinations remain a limitation.
Why It Matters
Enables rapid, low-sample custom TTS voice generation in ComfyUI, lowering the bar for personalized audio synthesis.