Scope LTX-2.3 Now Has IC-LoRA & Audio-In Support
Generate video of a person speaking with their actual voice in one pass.
Scope's LTX-2.3 update brings two major features: ID-LoRA and IC-LoRA support. ID-LoRA (Identity-Driven Audio-Video) allows zero-shot generation of a person speaking with their actual voice using just a reference image and a 5-second audio clip—all in a single model pass, no cascaded pipeline. The LoRA weights download automatically, and users simply flip Audio Mode to id_lora in the UI. This eliminates the need for separate voice and video models.
IC-LoRA (In-Context LoRAs) support a range of video editing tasks, including Edit Anything, Union Control (canny, depth, pose), Anime2Real, Inpaint, Outpaint, and video restoration (sharpen, decompress, remove color grading). These add less than 10% compute overhead and work with FP8 quantization. The update also fixes audio-video synchronization, adds real-time pacing for smooth playback, and introduces cloud mode for running on remote H100s. Limitations remain on frame count and resolution, and IC-LoRAs aren't fully supported in cloud inference yet.
- ID-LoRA generates video of a person speaking with their actual voice from a reference image and 5-second audio clip in a single model pass.
- IC-LoRAs support text-based video editing, style transfer, and restoration with under 10% compute overhead and FP8 quantization.
- Audio sync is fixed, real-time pacing added, and cloud mode enables running on remote H100s for those without a 4090.
Why It Matters
Real-time, single-pass voice-driven video generation democratizes AI video creation for professionals without expensive hardware.