I got tired of manually prompting every single clip for my AI music videos, so I built a 100% local open-source (LTX Video desktop + Gradio) app to automate it, meet - Synesthesia
Open-source tool generates 3-minute music videos in under an hour using 100% local processing.
Developer Rowan Underwood has released Synesthesia, a novel open-source tool that automates the labor-intensive process of creating AI-generated music videos. The application runs entirely locally, requiring three key inputs: an isolated vocal stem, the full band performance audio, and a text file of the lyrics. Users also provide a rough concept, which Synesthesia feeds to a locally-run large language model (LLM) like Qwen3.5-9B via LM Studio or llama.cpp. The LLM then generates a creative brief, including an appropriate singer persona and a plotline for the video.
The core innovation is the automated shot list. Synesthesia analyzes the audio to detect singing sections, creating a timeline that cuts to the vocal performance during lyrics and back to the narrative "story" during instrumental parts. This shot list, with video prompts written by the LLM, can be fully automatic or manually tweaked frame-by-frame. The tool then interfaces with the LTX Video Desktop application (not an official API) to render the video. The developer switched from ComfyUI due to speed issues, achieving a significant performance boost: a 3-minute, 540p first-pass video can be generated in under an hour on a high-end RTX 5090 GPU. Users can generate multiple "takes" per shot, review them in a "cutting room floor" directory, and assemble the final edit.
- Processes vocal stems, full audio, and lyrics to generate video concepts using a local LLM like Qwen3.5-9B.
- Automatically creates an editable shot list that cuts between performance and story based on vocal detection.
- Renders a 3-minute, 540p video in under an hour on an RTX 5090 by interfacing with LTX Video Desktop.
Why It Matters
Dramatically lowers the technical barrier and time investment for musicians and creators to produce custom, narrative-driven AI music videos.