Image & Video

LTX 2.3 can generate some really decent singing and music too

The new AI model produces audio quality rivaling Suno 3-4, with a major leap in vocal realism from its predecessor.

Deep Dive

Lightricks, the company behind popular creative apps like Facetune, has launched a substantial update to its LTX AI model. Version 2.3 demonstrates a surprising leap in audio generation capabilities, particularly for creating singing and vocal performances. Early user experiments, shared on social platforms, reveal that the model's output for vocals is now approaching the quality of dedicated AI music generators like Suno's v3 and v4 models. This represents a significant shift, as LTX is primarily known for image and video generation, suggesting a rapid convergence of multimodal AI abilities.

A key workflow highlighted by testers involves using the 'LTXGemmaEnhancePrompt' node to feed the model richly detailed, cinematic prompts. For instance, a prompt describing an indie folk singer in a dimly lit room, complete with lyrical snippets and descriptions of facial expressions, successfully generated a corresponding vocal performance. The AI captured not just the melody but also the emotive delivery described in the text. However, the model still shows limitations in generating high-quality, full-band instrumentation, with beats and bass lines often sounding hollow or synthetic. This indicates that while vocal synthesis is maturing quickly, coherent multi-instrument music generation remains a more complex challenge.

The release of LTX 2.3 signals that the barrier between different creative AI modalities—text, image, video, and audio—is crumbling faster than anticipated. Companies are no longer focusing on a single output type but are racing to build comprehensive, all-in-one generative platforms. For creators, this means the tools for prototyping ideas are becoming more powerful and unified, allowing for faster iteration from a text-based concept to a multi-sensory draft. The competition is pushing all players, from startups like Suno to established players like Lightricks, to rapidly improve their offerings.

Key Points
  • LTX 2.3's singing/vocal generation is now nearly on par with specialized AI music tool Suno 3-4, a massive improvement from version 2.0.
  • The model excels with detailed, scene-setting prompts processed through the 'LTXGemmaEnhancePrompt' node, generating coherent audio from descriptive text.
  • Instrumentation like drums and bass still lags behind vocals, sounding artificial and hollow, highlighting an area for future development.

Why It Matters

This accelerates the trend toward all-in-one creative AI platforms, giving professionals a single tool for prototyping multi-format content from text.