Media & Culture

5 frontier AI models were asked to code bots to navigate a foggy maze with teleportals. 1st to the exit wins. Over 500 steps and you're eliminated. Gemini, ChatGPT, and Mimo bots never made it past round 8. Here's Claude's and Grok's bots playing Round 93.

Claude and Grok bots survive to Round 93 while ChatGPT, Gemini, and Mimo fail before Round 8.

Deep Dive

A viral AI benchmark has emerged, pitting five leading language models—Anthropic's Claude, xAI's Grok, OpenAI's ChatGPT, Google's Gemini, and Mosaic's Mimo—against a challenging spatial reasoning task. The competition required each model to code an autonomous bot capable of navigating a procedurally generated maze under severe constraints. The bots operated with zero prior knowledge of the maze layout, limited to a "fog of war" that revealed only a 5×5 grid around their current position. The environment included standard walls, an exit in a far corner, and teleportals that could warp the bot across the map. The core challenge was for each AI to build an accurate mental map from these partial, local observations and devise an efficient pathfinding strategy, all while racing against a strict 500-step elimination limit.

The results revealed a stark performance divide. Models from OpenAI (ChatGPT), Google (Gemini), and Mosaic (Mimo) consistently failed to create bots that could survive beyond the eighth tournament round, often exceeding the step limit or getting trapped. In contrast, bots programmed by Anthropic's Claude and xAI's Grok demonstrated remarkably robust strategies, surviving deep into the competition and facing off in a showcased Round 93. Their success suggests advanced capabilities in iterative exploration, state tracking, and long-horizon planning—skills where other models faltered. This tournament, while informal, acts as a compelling proxy for evaluating an AI's ability to reason under uncertainty, manage complex state, and execute multi-step plans, which are critical for real-world applications like robotics and autonomous systems.

Key Points
  • The tournament tested five AI models: Claude, Grok, ChatGPT, Gemini, and Mimo on a complex maze navigation task.
  • Bots were limited to a 5×5 local view and had to map teleportals and walls with a strict 500-step limit.
  • Claude and Grok advanced to Round 93, while ChatGPT, Gemini, and Mimo were eliminated before Round 8.

Why It Matters

Highlights which AI models excel at long-term planning and reasoning under uncertainty—key for robotics and autonomous agents.