Image & Video

Waypoint-1.5 New open source world model trained on FPS games to run on local consumer GPUs at 60fps

Trained on 100 hours of FPS gameplay, this new model predicts future video frames for autonomous agents.

Deep Dive

Wayve, a UK-based autonomous vehicle startup, has released Waypoint-1.5, a significant open-source contribution to the field of 'world models' for AI agents. Unlike large language models that predict text, world models predict future states of an environment, a critical capability for robots and autonomous systems that must plan actions. Waypoint-1.5 was trained on a novel dataset of 100 hours of curated gameplay from the first-person shooter (FPS) game 'Counter-Strike 2'. This provides a rich, action-oriented simulation of a 3D world where an agent's decisions have immediate consequences.

The model's architecture is a video prediction model that takes a single image as input and generates a sequence of potential future frames. This allows an AI agent to simulate 'what would happen if I took this action?' before committing to a real-world move. A key technical achievement is its efficiency: Waypoint-1.5 is designed to run inference at 60 frames per second on a consumer-grade NVIDIA RTX 4090 GPU. This performance benchmark makes it a practical tool for real-time agent training and simulation, lowering the barrier to entry for researchers and developers experimenting with embodied AI.

By open-sourcing both the model weights and the training code, Wayve is catalyzing research in a domain often dominated by proprietary, compute-intensive projects from large tech firms. The use of game environments as a training ground is a proven strategy—similar to how DeepMind used StarCraft—as it provides a complex, rule-based world that is cheaper and safer to experiment in than physical robots or cars. Waypoint-1.5 represents a step towards more general and capable AI agents that can understand and plan within dynamic visual environments.

Key Points
  • Trained on 100 hours of curated 'Counter-Strike 2' FPS gameplay, providing a complex 3D action environment.
  • Generates future video frame predictions from a single image at 60fps on a consumer RTX 4090 GPU.
  • Open-sourced model and code to accelerate research into 'world models' for planning AI agents and robotics.

Why It Matters

Democratizes advanced AI agent research by providing a powerful, efficient world model that runs on affordable hardware.