Image & Video

Lightricks' LTX-2.3 LipDub syncs speech to video with 22B parameter model

Dwight Schrute lip-syncs a changelog in stunning realism with new AI model

Deep Dive

Lightricks, the company behind popular image and video AI tools, has released LTX-2.3 LipDub, a 22-billion parameter Image Conditioning LoRA (IC-LoRA) model designed for precise lip-syncing. In a viral test, a user dubbed Dwight Schrute from The Office reading a changelog, using a static camera, single subject, direct-to-camera setup. The model handled natural cadence and pauses seamlessly, proving ideal for talking-head video formats.

The model's key strength is its ability to maintain sync through realistic speech patterns, including hesitations and emphasis, without breaking immersion. The workflow, shared on Hugging Face, allows users to replace audio in existing video while preserving original lip movements. This opens up possibilities for professional dubbing, content localization, and even deepfake mitigation by providing a baseline for realistic lip-sync generation.

Key Points
  • Lightricks' LTX-2.3 LipDub uses a 22B parameter Image Conditioning LoRA (IC-LoRA) for precise lip-sync.
  • Tested with a static camera, single subject talking directly to camera; sync held through natural cadence and pauses.
  • Workflow available on Hugging Face, enabling realistic dubbing for video content without artifacts.

Why It Matters

Democratizes high-quality video dubbing, enabling realistic lip-sync for any talking head content.