RON-TAC: Closed-Loop Imitation Learning for Cooperative Tactical AI in Ready or Not (UE5.3)
A 40M-parameter AI policy autonomously issues tactical commands with 60% accuracy, running at 2Hz with <500ms latency.
A developer has created RON-TAC, a novel AI system that trains cooperative tactical agents directly from human gameplay within the commercial SWAT simulator Ready or Not. The core is a DAgger-style imitation learning pipeline. A lightweight C++ mod (3.8k lines of code) hooks into the game at runtime, capturing 384x384 RGB frames and logging every player command and game state snapshot to a JSONL file. This creates a dataset of human demonstrations for the AI to learn from.
A Python live-inference loop powered by PyTorch and CUDA processes the latest game data. It uses a vision transformer (T3-Vis) for visual understanding and a 39.9 million-parameter set-transformer (T3-Tac) that consumes both visual embeddings and structured scene data. The model outputs one of 18 discrete tactical commands—like BREACH, STACK_UP, or ARREST_TARGET—along with team assignments. If confidence is high, the mod dispatches the command back into the game instantly, creating a seamless human-in-the-loop system where the player can override AI decisions at any time.
The system is trained on a growing dataset of over 1,173 player-issued commands, achieving a macro validation accuracy of 60.6%. Performance varies by command, with perfect accuracy on HOLD and 64% on BREACH. In live operation on an RTX 5090 laptop, it runs at 2 Hz with under 500 milliseconds of end-to-end latency, showing no noticeable impact on game performance. The attached demonstration video shows the AI squad autonomously executing complex compound-clearing maneuvers, issuing verbal commands, and adapting to dynamic hostage situations, with 23 autonomous commands issued in 90 seconds.
- Uses a DAgger-style imitation learning pipeline with a 40M-parameter T3-Tac transformer model to learn from human gameplay.
- Achieves 60.6% macro validation accuracy on 18 tactical commands, with perfect performance on the HOLD command.
- Runs live inference at 2Hz with <500ms latency on an RTX 5090, allowing real-time human oversight and correction.
Why It Matters
Demonstrates a practical, low-latency pipeline for training complex multi-agent AI directly in commercial game engines, advancing simulation-to-real research.