CoPark's self-play RL achieves 70-85% reactive parking success with 3-6% collision rate
Autonomous parking that yields, queues, and reverses — all learned via self-play
CoPark tackles the conflicting goals of high-precision parking and safe interaction with other vehicles. The key innovation is a partner-threat-modulated, channel-asymmetric release of the precomputed prior. A continuous threat signal shifts longitudinal control to the residual head for yielding, while lateral control stays anchored to the offline plan, preserving sub-meter slot alignment. A closed-loop refinement layer corrects residual errors from action-grid discretization. The policy is trained on six parking lots and evaluated zero-shot on the new reactive-parking benchmark (Dragon Lake Parking and DeepScenario Open 3D).
Results show CoPark achieves roughly 70-85% success with only 3-6% collision rate, significantly beating classical planners, imitation learning, and large-scale RL baselines. Importantly, the system exhibits emergent interactive behaviors like reverse-yielding, mid-maneuver yielding, tight-corridor passing, and queuing — none explicitly programmed. This suggests self-play RL on top of robust plan priors can produce safe, socially-aware autonomous driving maneuvers in constrained environments.
- Residual-policy architecture: precomputed plan provides fixed prior, learned head handles reactive corrections
- 70-85% success rate with 3-6% collisions on zero-shot benchmarks (DLP, DSC3D)
- Emergent behaviors: reverse-yielding, queuing, tight-corridor passing — all unscripted
Why It Matters
CoPark demonstrates that self-play RL can safely coordinate multiple autonomous vehicles in tight spaces, reducing collisions while achieving high parking precision.