Evolution of Video Generative Foundations
A new 2026 survey traces video generation's path from early GANs to today's diffusion and auto-regressive models.
A research team from Shanghai Jiao Tong University and other institutions has published a landmark survey titled 'Evolution of Video Generative Foundations' on arXiv. The paper, submitted in April 2026, provides the first systematic review tracing the entire technological arc of AI video generation. It moves beyond narrow technical focuses to cover the field's evolution from early Generative Adversarial Networks (GANs) to the current dominance of diffusion models like OpenAI's Sora and Google's Veo3, and further to emerging auto-regressive (AR) and multimodal techniques.
The survey conducts an in-depth analysis of foundational principles, key advancements, and the comparative strengths and limitations of each paradigm. It specifically addresses gaps in existing literature by providing a comprehensive perspective on AR models and the integration of multimodal information—crucial for enhancing contextual awareness in generated videos. The authors explore how these advancements are paving the way for building sophisticated 'world models' capable of simulating real-world dynamics.
Finally, the paper bridges historical developments with contemporary innovations to offer actionable insights for future research. It outlines applications spanning virtual and augmented reality, personalized education, autonomous driving simulations, and digital entertainment. By mapping the field's trajectory, this survey serves as an essential guide for researchers and practitioners navigating the rapidly evolving landscape of video generative AI.
- Comprehensively reviews video AI evolution from GANs to diffusion models (Sora, Veo3) and auto-regressive techniques.
- Addresses key literature gaps by analyzing multimodal integration and AR models for better contextual awareness.
- Guides future research toward applications in VR, education, and autonomous driving 'world models'.
Why It Matters
This survey provides a crucial roadmap for developers and researchers to understand past breakthroughs and future directions in generative video AI.