Image & Video

Last week in Generative Image & Video

Numina fixes AI's object counting problem, while Inspatio World turns videos into explorable 4D environments.

Deep Dive

The open-source AI community delivered significant breakthroughs in generative video last week, tackling core challenges like object consistency and temporal control. The standout project, Numina, finally addresses AI's notorious object counting problem. By reading the model's attention maps during generation, it can detect and correct counting errors—like generating exactly three requested cats—without requiring any model retraining. This represents a major step toward reliable, controllable video synthesis.

Another leap forward comes from Inspatio World, which transforms ordinary 2D videos into navigable 4D environments. Users can walk around the reconstructed 3D scene and scrub through time with minimal visual drift, all running efficiently on consumer-grade GPUs. This technology opens new doors for immersive content creation and interactive media.

Additional innovations include Prompt Relay, a training-free method for precise temporal control in multi-event video generation that works with models like Wan2.2 and CogVideo. Google also contributed with its FIT dataset, a 1.13M-triplet collection built on FLUX.1 for physics-aware virtual clothing try-ons that outperforms existing methods. Together, these tools demonstrate the rapid, community-driven progress in making AI video generation more precise, controllable, and accessible.

Key Points
  • Numina AI corrects object counting errors in generated videos by reading attention maps, ensuring prompt accuracy without retraining.
  • Inspatio World reconstructs explorable 4D worlds from standard video, enabling 3D navigation and time scrubbing on consumer GPUs.
  • Google's FIT dataset (1.13M triplets) enables physics-based virtual try-ons, beating IDM-VTON on fit metrics using FLUX.1 + LoRA.

Why It Matters

These tools make AI video generation more reliable and creative, enabling precise control for professionals in media and e-commerce.