Research & Papers

KL Regularization achieves 10x faster convergence in zero-sum games

New paper shows KL regularization alone can hit O(1/n) rates, beating standard O(1/√n).

Deep Dive

A new study shows that KL regularization alone stabilizes offline two-player zero-sum Markov games. Their ROSE framework achieves a fast O(1/n) convergence rate under unilateral concentrability, improving over the standard O(1/√n). They also propose SOS-MD, a practical model-free algorithm with last-iterate convergence to Nash equilibria.

Key Points
  • ROSE framework achieves O(1/n) convergence under unilateral concentrability vs standard O(1/√n)
  • SOS-MD is a model-free algorithm using least-squares value estimation and self-play
  • KL regularization alone suffices — no need for explicit pessimism or exploration bonuses

Why It Matters

Faster convergence in multi-agent offline learning means more efficient training of AI opponents and game-theoretic systems.