HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
The open-source model matches closed-source rivals, creating navigable 3D scenes from simple prompts.
Team HY-World, a large collaborative research group, has unveiled HY-World 2.0, a significant upgrade to their multi-modal world model framework. This open-source system can generate complete, navigable 3D worlds from remarkably simple inputs like a text prompt or a single image. The architecture employs a sophisticated four-stage pipeline: generating a panoramic view, planning a trajectory through the scene, expanding the world with consistent 3D geometry, and finally composing the full environment. A key innovation is the upgraded WorldStereo 2.0 model, which uses a "consistent memory" mechanism to ensure visual coherence across different viewpoints as the scene is built.
To make these generated worlds usable, the team also introduced WorldLens, a high-performance 3D Gaussian Splatting (3DGS) rendering platform. WorldLens features an engine-agnostic design, automatic lighting, and efficient collision detection, allowing for real-time, interactive exploration of the synthesized 3D spaces, complete with support for adding characters. Extensive benchmarking shows HY-World 2.0 achieves state-of-the-art results among open-source approaches, with performance comparable to leading closed-source models like Marble. By releasing all model weights, code, and technical details, the team aims to democratize advanced 3D world generation and simulation, providing a powerful foundation for future research in gaming, virtual reality, robotics, and digital twin creation.
- Generates navigable 3D Gaussian Splatting scenes from text or a single image via a four-stage pipeline.
- Introduces WorldLens, a new rendering platform for real-time exploration with lighting and collision detection.
- Open-source model matches closed-source rival Marble's performance, with all code and weights publicly released.
Why It Matters
Democratizes high-end 3D world creation for game dev, VR, and simulation, challenging proprietary model dominance.