ComfyUI-DramaBox now supports LoRA weights for custom voice style transfer?

ComfyUI-DramaBox now supports LoRA weights for custom voice style transfer.

Voice-Clone-Studio-DramaBox includes a Prep Sample tab that auto-chops long audio into transcribed clips (best?

10 clips, 5–10 seconds each).

The tool reuses models from ComfyUI's dramabox folder and is optimized for quick dataset generation, though hallucinations remain a limitation?

The tool reuses models from ComfyUI's dramabox folder and is optimized for quick dataset generation, though hallucinations remain a limitation.

Image & Video

ComfyUI-DramaBox now supports Loras and auto-dataset generation from audio

r/StableDiffusion May 17, 2026

⚡New TTS node generates custom voices from just 10 short audio clips

Deep Dive

The ComfyUI-DramaBox node, originally built by Francky_B around the LTX-based TTS model DramaBox by u/manmaynakhashi, has been updated to support LoRA (Low-Rank Adaptation) weights. This allows users to fine-tune voice characteristics with minimal samples, enabling personalized text-to-speech generation directly in ComfyUI. The update requires users to point the node to the models/dramabox folder, ensuring consistency with the companion tool.

Alongside the node update, Francky_B released Voice-Clone-Studio-DramaBox, a streamlined version of his larger TTS toolkit. It focuses exclusively on DramaBox, keeping only the Qwen-TTS voice design module for generating voice profiles. The standout feature is the Prep Sample tab, which automatically splits a single long audio file into phrase-level clips and transcribes them, producing a ready-to-use training dataset. The developer notes that 10 clips of 5–10 seconds yield better results than 80 longer clips, but warns that DramaBox is still prone to hallucinations, making this an experimental tool for voice cloning enthusiasts.

Key Points

ComfyUI-DramaBox now supports LoRA weights for custom voice style transfer.
Voice-Clone-Studio-DramaBox includes a Prep Sample tab that auto-chops long audio into transcribed clips (best: 10 clips, 5–10 seconds each).
The tool reuses models from ComfyUI's dramabox folder and is optimized for quick dataset generation, though hallucinations remain a limitation.

Why It Matters

Enables rapid, low-sample custom TTS voice generation in ComfyUI, lowering the bar for personalized audio synthesis.

Read Original Article

ComfyUI-DramaBox now supports Loras and auto-dataset generation from audio

Why It Matters

Related Articles

🚀 Stay Ahead in AI