KL Regularization achieves 10x faster convergence in zero-sum games
New paper shows KL regularization alone can hit O(1/n) rates, beating standard O(1/√n).
Deep Dive
A new study shows that KL regularization alone stabilizes offline two-player zero-sum Markov games. Their ROSE framework achieves a fast O(1/n) convergence rate under unilateral concentrability, improving over the standard O(1/√n). They also propose SOS-MD, a practical model-free algorithm with last-iterate convergence to Nash equilibria.
Key Points
- ROSE framework achieves O(1/n) convergence under unilateral concentrability vs standard O(1/√n)
- SOS-MD is a model-free algorithm using least-squares value estimation and self-play
- KL regularization alone suffices — no need for explicit pessimism or exploration bonuses
Why It Matters
Faster convergence in multi-agent offline learning means more efficient training of AI opponents and game-theoretic systems.