SegTune gives you fine-grained control over AI song generation
Select specific segments, adjust lyrics, and control musical dynamics per bar.
Most AI song generators can create tunes from lyrics and global prompts, but they treat the entire song as one monolithic block, making it nearly impossible to adjust a specific verse, chorus, or instrumental break. SegTune changes that. Built on a Diffusion Transformer architecture, the framework lets users (or large language models) assign localized textual descriptions to individual song segments. These segment prompts are temporally broadcast to corresponding time windows, while a global prompt keeps the overall style coherent. The system also includes an LLM-based duration predictor that autoregressively generates sentence-level timestamps in LyRiCs format, ensuring precise alignment between lyrics and music. A large-scale data pipeline was built to collect high-quality songs with aligned lyrics and prompts, and new metrics evaluate segment alignment and vocal consistency.
SegTune was accepted to ACL 2026 as an oral presentation and has been nominated for the Best Paper Award. In experiments, it outperformed existing baselines in both musicality and controllability. This means creators could soon use AI to compose full songs with verse-by-verse control—adjusting tempo, instrumentation, or vocal style for each section—without rewriting the whole piece. The approach also opens the door for LLMs to act as intelligent music producers, parsing natural language instructions to generate complex, structured compositions. Codes and generated songs are available on the project page.
- SegTune uses a Diffusion Transformer to allow per-segment local descriptions, enabling fine-grained control over song structure.
- An LLM-based duration predictor outputs sentence-level timestamps in LyRiCs format for precise lyric-to-music alignment.
- Accepted to ACL 2026 as oral and nominated for Best Paper, SegTune outperforms baselines in both musicality and controllability.
Why It Matters
Music producers and AI tool builders gain structured control over song segments, paving the way for professional-grade AI composition.