Image & Video

LTX 2.3: Official Workflows and Pipelines Comparison

Analysis reveals official workflows use a two-stage model with specific LoRA weights for best quality.

Deep Dive

A technical deep dive into Lightricks' LTX 2.3 video generation model reveals significant differences between its official workflows, explaining why results like the viral 'Will Smith eating spaghetti' vary so widely. The investigation, using Gemini to analyze the official GitHub repositories, found that the model employs a two-stage architecture where Stage 1 creates the initial video and Stage 2 upscales it. The key to achieving the highest fidelity lies in Stage 1 configuration: using the `res_2s` sampler via a ClownSampler node, applying the MultiModalGuider for better frame consistency, and carefully tuning the distilled LoRA strength to 0.25. This High-Quality (HQ) I2V pipeline is contrasted with a balanced A2V pipeline and an ultra-fast distilled version from the Desktop App.

The trade-offs are stark. The HQ pipeline demands the most resources, holding two ledgers in VRAM and running for about 15 steps in Stage 1, followed by 3 steps in Stage 2 with the LoRA strength increased to 0.5. The Desktop App's 'Maximum Speed' version, however, uses a fully distilled model with baked-in weights, a simple CFGGuider, only 8 steps in Stage 1, and a single ledger for ultra-low VRAM usage. This analysis clarifies that the commonly shared ComfyUI templates are tuned for speed, and replicating the official HQ results requires precise, resource-intensive settings that many users may not be applying, leading to the observed quality disparity.

Key Points
  • The HQ pipeline uses a `res_2s` sampler and MultiModalGuider in Stage 1 for maximum frame fidelity.
  • Optimal LoRA weights are 0.25 for Stage 1 and 0.5 for Stage 2, differing from default templates.
  • The Desktop App's distilled version uses only 8 steps and a single ledger for speed, sacrificing some quality.

Why It Matters

Users can now choose the correct LTX 2.3 workflow for their needs, balancing VRAM, speed, and output quality.