Image & Video

Wayve's Waypoint-1.5 open-source world model runs 60fps on consumer GPUs

Trained on 100 hours of FPS gameplay, this new model predicts future video frames for autonomous agents.

Deep Dive

Wayve, a UK-based autonomous vehicle startup, has released Waypoint-1.5, a significant open-source contribution to the field of 'world models' for AI agents. Unlike large language models that predict text, world models predict future states of an environment, a critical capability for robots and autonomous systems that must plan actions. Waypoint-1.5 was trained on a novel dataset of 100 hours of curated gameplay from the first-person shooter (FPS) game 'Counter-Strike 2'. This provides a rich, action-oriented simulation of a 3D world where an agent's decisions have immediate consequences.

The model's architecture is a video prediction model that takes a single image as input and generates a sequence of potential future frames. This allows an AI agent to simulate 'what would happen if I took this action?' before committing to a real-world move. A key technical achievement is its efficiency: Waypoint-1.5 is designed to run inference at 60 frames per second on a consumer-grade NVIDIA RTX 4090 GPU. This performance benchmark makes it a practical tool for real-time agent training and simulation, lowering the barrier to entry for researchers and developers experimenting with embodied AI.

By open-sourcing both the model weights and the training code, Wayve is catalyzing research in a domain often dominated by proprietary, compute-intensive projects from large tech firms. The use of game environments as a training ground is a proven strategy—similar to how DeepMind used StarCraft—as it provides a complex, rule-based world that is cheaper and safer to experiment in than physical robots or cars. Waypoint-1.5 represents a step towards more general and capable AI agents that can understand and plan within dynamic visual environments.

Key Points
  • Trained on 100 hours of curated 'Counter-Strike 2' FPS gameplay, providing a complex 3D action environment.
  • Generates future video frame predictions from a single image at 60fps on a consumer RTX 4090 GPU.
  • Open-sourced model and code to accelerate research into 'world models' for planning AI agents and robotics.

Why It Matters

Democratizes advanced AI agent research by providing a powerful, efficient world model that runs on affordable hardware.

📬 Get the top 10 AI stories daily