Image & Video

I think I figured out how to fix the audio issues in LTX 2.3

A user's workflow tweaks in ComfyUI eliminate the common metallic hiss in AI audio generation.

Deep Dive

A significant community-driven fix has emerged for the audio quality issues plaguing Stability AI's LTX Studio 2.3, a leading text-to-audio generation model. A user experimenting with the official ComfyUI workflows identified two key modifications that drastically reduce the metallic hiss and artifacts common in many generations. The primary change involves replacing the model's built-in LTXV scheduler with a standard BasicScheduler, which immediately produces cleaner and more structured audio. Additionally, for workflows using the distilled (smaller, faster) model, splitting the generation steps—starting with the full dev model before refining with the distilled model—adds crucial detail and variety that eliminates artifacts.

These tweaks, shared via a public ComfyUI workflow file, represent a practical workaround for a widespread problem affecting users of the tool. The fix is notable because it addresses a core technical component (the scheduler) and optimizes the inference process between model sizes, rather than just adjusting superficial settings. For AI audio creators, this means generations from LTX Studio 2.3 can now achieve significantly higher fidelity without the distracting digital noise that has been a major pain point, improving the usability of AI-generated sound effects, music, and voiceovers for professional projects.

Key Points
  • Replacing the LTXV scheduler with a BasicScheduler in ComfyUI workflows produces cleaner, less metallic audio.
  • For the distilled model, splitting sigmas—4 steps with the dev model followed by 4 with the distilled model—removes artifacts.
  • The shared workflow offers a direct, tested solution for a common quality issue in Stability AI's LTX Studio 2.3.

Why It Matters

This fix directly improves the production quality of AI-generated audio, making LTX Studio more viable for professional sound design and content creation.