Image & Video

Pushing LTX 2.3 to the Limit: Rack Focus + Dolly Out Stress Test [Image-to-Video]

r/StableDiffusion March 11, 2026

⚡A stress test combining a dolly out and rack focus reveals the model's structural limitations under complex motion.

Deep Dive

A detailed technical stress test by a user has pushed the LTX 2.3 image-to-video model to its breaking point, revealing significant limitations in handling complex cinematic motion. The test, conducted using the built-in workflow in ComfyUI on a high-end rig with an NVIDIA RTX 4090, aimed to generate a 7-second, 1080p video from a detailed prompt. The prompt described a slow dolly out shot with a rapid rack focus shift, moving from a cyborg woman's face to her mechanical hands extending toward the camera, all rendered with specific anamorphic lens characteristics. The goal was not a perfect output but to identify the model's failure points under demanding spatial and focal transformations.

The results were a clear demonstration of current generative video constraints. LTX 2.3 completely failed to execute the core camera movement, generating only arm extension with no physical dolly back. More critically, the model's temporal coherence collapsed: the intricate, rigid mechanical geometry of the cyborg's hands dissolved into an unstructured 'pixel soup' as they moved into the foreground. Furthermore, the requested vintage Cooke anamorphic lens bokeh was ignored, replaced by a standard digital blur. This test confirms that while LTX 2.3 can manage static scenes or subtle motions, combining aggressive forward object movement with extreme depth-of-field changes shatters its ability to maintain structural integrity, pointing to a key challenge for future model development.

Key Points

The test on an RTX 4090 generated a 1080p clip in 284 seconds on a warm start, but the model ignored the 'cinematic slow dolly out' camera movement entirely.
The cyborg's detailed mechanical hands lost all structural integrity, melting into 'pixel soup' during the foreground movement, showing a failure of temporal coherence.
Specific artistic directives like 'Cooke Anamorphic lens' bokeh were not followed, defaulting to generic blur, highlighting a gap in prompt adherence for complex cinematography.

Why It Matters

For professionals, it defines the current practical boundary for AI video generation: intricate mechanical motion with camera moves remains a major unsolved challenge.

Read Original Article

Pushing LTX 2.3 to the Limit: Rack Focus + Dolly Out Stress Test [Image-to-Video]

Why It Matters

Stay Ahead in AI