MAPLE: New framework trains self-driving AI without simulators, beats SOTA
No more brittle imitation learning: MAPLE uses latent multi-agent play for robust driving.
Autonomous driving models based on vision-language-action (VLA) architectures often fail in closed-loop settings due to brittle imitation learning. Traditional closed-loop supervision lacks scalability and fails to model reactive environments. To address this, a team of researchers introduces MAPLE (Latent Multi-Agent Play for End-to-End Autonomous Driving), a novel framework that performs reactive, multi-agent rollouts entirely in the latent space of the VLA model.
MAPLE works by independently controlling the ego vehicle and nearby traffic agents over multi-step horizons while maintaining reactivity between agents. This enables realistic closed-loop training without any external simulator, which are computationally expensive and limited in visual fidelity. The framework consists of two stages: supervised fine-tuning on latent rollouts derived from ground-truth trajectories, followed by reinforcement learning with global and agent-specific rewards that encourage safety, progress, and interaction realism. Additionally, diversity rewards push the model to explore planning behaviors not present in logged driving data.
The results are impressive: MAPLE achieves state-of-the-art driving performance on the Bench2Drive benchmark, demonstrating that scalable closed-loop multi-agent play leads to more robust end-to-end autonomous driving systems. By eliminating the need for simulators, MAPLE offers a practical path toward deploying safer self-driving models that can handle dynamic, real-world interactions.
- MAPLE uses latent-space rollouts to simulate multi-agent reactive driving scenarios without external simulators.
- Two-stage training: supervised fine-tuning on ground-truth trajectories, then RL with safety, progress, and diversity rewards.
- Achieves state-of-the-art on Bench2Drive; eliminates computational cost and fidelity limits of simulator-based training.
Why It Matters
Scalable closed-loop training without simulators could accelerate robust real-world autonomous driving deployment.