How do the closed source models get their generation times so low?
RTX 6000 and B200 GPUs produce identical slow speeds for open-source LTX 2.3 video generation.
A viral Reddit post has exposed a startling performance chasm between open-source and closed-source AI video generation models. A user conducting hands-on tests found that the open-source LTX 2.3 model generated 840x480 videos at a sluggish pace of 10-12 seconds per iteration, even when running on vastly different hardware—from a consumer RTX 5070 Ti to a professional RTX 6000 Ada and finally a cutting-edge, $30,000 NVIDIA B200 GPU. The identical speed across this hardware spectrum points to a fundamental bottleneck within the model architecture or inference pipeline itself, not a lack of computational power.
This slow performance stands in stark contrast to closed-source competitors like xAI's Grok, which reportedly produces videos of similar resolution in a mere 6-10 seconds for the entire generation. The post has ignited debate within the AI community, questioning whether the gap is due to superior model distillation, proprietary optimizations, or undisclosed architectural breakthroughs that open-source projects have yet to replicate. The discussion underscores that for real-time applications, raw model capability is meaningless without equally advanced inference engineering.
- Open-source LTX 2.3 generated video at 10-12s/iteration on both an RTX 6000 and a flagship B200 GPU.
- Closed-source models like xAI's Grok produce similar 840x480 videos in just 6-10 seconds total, a 10x+ speed advantage.
- The test suggests the performance gap is due to software and model optimization, not just access to better hardware.
Why It Matters
For AI video to be practical, generation must be fast and affordable; this gap shows open-source is lagging in critical inference optimization.