Sigh.... the line is, "Behold… the heart of a shattered sun. A power that can slow the turning of the world." I don't what happened here lol. LTX 2.3 image to video with audio support in Comfyui.
Users report LTX 2.3's audio-to-video generation plays sound but fails to animate character speech.
A promising new feature in the LTX 2.3 AI model is hitting a significant snag, as reported by users on platforms like Reddit. The model, which integrates with the popular node-based workflow tool ComfyUI, recently added an "image to video with audio" capability. This function is designed to take a static character image and an MP3 audio file, then generate a video where the character appears to speak in sync with the audio—a highly sought-after tool for content creators. However, multiple users have found that while the final video file contains the correct audio track, the character's lip movements are not animated to match, resulting in an awkward, silent performance that undermines the feature's entire purpose.
The issue appears to be a generation or rendering bug within the LTX 2.3 pipeline itself. Community troubleshooting suggests the problem isn't with loading the assets but occurs during the synthesis phase. This failure highlights the ongoing technical challenges in perfectly synchronizing multimodal AI outputs, especially for complex tasks like viseme (mouth shape) prediction from audio waveforms. For now, creators experimenting with this cutting-edge, open-source video generation tool are left with broken workflows and the humorous, yet frustrating, result of characters dramatically posed against epic narration—with completely still mouths.
- LTX 2.3's new audio-driven video feature in ComfyUI generates audio but fails to animate character lip-sync.
- The bug produces videos where characters remain silent while the soundtrack plays, breaking the core functionality.
- The issue underscores the technical difficulty in achieving reliable audio-visual synchronization in open-source AI video models.
Why It Matters
Reliable lip-sync is crucial for AI video narration and avatar content; this bug blocks practical use for creators.