Research & Papers

World Models survey: 4-axis taxonomy unifies PlaNet, Sora, Genie

World models for AGI: 4-axis taxonomy spans Dreamer to Sora, chain-of-thought convergence.

Deep Dive

A new arXiv paper (2606.00133) by a team of 26 researchers from multiple institutions provides the first unified framework for world models—internal simulators that learn environment dynamics to enable prediction, planning, and reasoning. The survey organizes the fast-growing field along a four-axis taxonomy: architecture (representation format, dynamics formulation, input modality, learning paradigm, and downstream tasks), methodological family (state-space/recurrent nets, transformers, diffusion generators, physics-informed networks, language-augmented multimodal systems), reasoning strategy (imagination-based planning, latent policy learning, counterfactual reasoning, planning under uncertainty), and application domain (robotics, autonomous driving, video prediction, multimodal agents, reinforcement learning, scientific modeling, medical imaging, education, finance). The paper traces the lineage from cognitive science foundations through milestone systems like PlaNet (2018), the Dreamer family, MuZero, Sora, Cosmos, and Genie, illustrating how these dimensions interact.

A key highlight is the recent convergence of chain-of-thought (CoT) reasoning with world-model imagination, enabling more robust long-horizon planning and symbolic manipulation. The survey also reviews evaluation protocols and benchmarks, identifying persistent challenges such as compounding prediction errors, sim-to-real transfer gaps, and fragmented evaluation standards. It discusses how safety-critical domains (autonomous driving, medical imaging) demand reliable, interpretable world models. Looking ahead, the authors call for unified multimodal world models, foundation-scale interactive simulators trained on diverse data, and safe deployment frameworks. This comprehensive mapping helps researchers and practitioners navigate an increasingly fragmented field and accelerates progress toward artificial general intelligence.

Key Points
  • Defines a 4-axis taxonomy: architecture, methodology, reasoning strategy, and application domain covering 10+ fields.
  • Covers milestone systems: PlaNet (2018), Dreamer family, MuZero, Sora, Cosmos, Genie, and their contributions.
  • Identifies convergence of chain-of-thought reasoning with world-model imagination and challenges like compounding errors and sim-to-real transfer.

Why It Matters

Unified world-model framework accelerates AGI research and safer deployment in robotics, autonomous driving, and scientific modeling.