Open Source

DoomVLM is now Open Source - VLM models playing Doom

Open-source tool lets VLMs like Qwen 3.5 play Doom deathmatches with 0.5-second response times on GPUs.

Deep Dive

Developer MrFelliks has released DoomVLM as an open-source project, transforming the classic first-person shooter Doom into a benchmarking arena for vision language models (VLMs). The tool, built as a single Jupyter notebook under an MIT license, uses the ViZDoom environment to capture game screenshots, overlays them with a numbered grid, and sends them to any VLM via an OpenAI-compatible API. Models receive two tools—shoot(column) and move(direction)—and must make real-time decisions without any reinforcement learning or fine-tuning, relying purely on visual inference. The system supports any API endpoint, including LM Studio, Ollama, vLLM, OpenRouter, and direct connections to OpenAI or Anthropic's Claude.

The major update introduces two competitive deathmatch modes: a benchmark mode where models take turns under identical conditions for fair comparison, and an arena mode where up to four models play simultaneously via multiprocessing, with faster inference granting more turns. Each agent is fully configurable with custom system prompts, tool descriptions, and sampling parameters. The notebook records episodes as GIF/MP4 files with overlays showing health, ammo, model decisions, and latency, while maintaining a live scoreboard. Performance varies dramatically by hardware—from 10 seconds per step on a MacBook M1 Pro to 0.5 seconds on a RunPod L40S GPU—making GPU acceleration essential for proper arena gameplay.

MrFelliks notes that simpler, shorter prompts generally yield better results than detailed instructions, and while the tool supports flagship models like GPT-4o and Claude 3.5, comprehensive testing remains for the community. The project now shifts to exploration phase, inviting users to experiment with different model-prompt combinations and share findings about which VLMs actually survive in Doom's chaotic environments.

Key Points
  • Supports any OpenAI-compatible VLM API (LM Studio, Ollama, vLLM, OpenRouter) for models from 0.8B to 9B parameters
  • Features two deathmatch modes: benchmark (fair turn-based) and arena (simultaneous with 4 models via multiprocessing)
  • Records gameplay with performance overlays (HP, ammo, decisions, latency) and achieves 0.5-second response times on GPU hardware

Why It Matters

Provides a standardized, visual testbed for comparing VLM reasoning speed and quality in real-time, interactive scenarios beyond static benchmarks.