Regularized last-iterate methods (R-NaD, magnetic mirror descent) select the maximum-entropy Nash equilibrium exactly on a 2-D polytope and at 99.7% of maximum entropy in Kuhn poker?

Regularized last-iterate methods (R-NaD, magnetic mirror descent) select the maximum-entropy Nash equilibrium exactly on a 2-D polytope and at 99.7% of maximum entropy in Kuhn poker.

Regret-averaging methods (CFR, CFR+, fictitious play) drift to lower-entropy faces; in a 180-game ensemble, CFR+ was below max-entropy in 94% of games (p < 10^-27)?

Regret-averaging methods (CFR, CFR+, fictitious play) drift to lower-entropy faces; in a 180-game ensemble, CFR+ was below max-entropy in 94% of games (p < 10^-27).

The selected equilibrium has real consequences?

max-entropy provides a better hedge against sub-optimal opponents in sequential/hidden-information games like Kuhn.

Research & Papers

New Study: AI Game Solvers Don't All Pick the Same Nash Equilibrium

Q: The selected equilibrium has real consequences?

max-entropy provides a better hedge against sub-optimal opponents in sequential/hidden-information games like Kuhn.

arXiv cs.GT June 29, 2026

⚡R-NaD finds max-entropy strategies; CFR+ drifts to lower-entropy faces in zero-sum games.

Deep Dive

A new paper by Luis Leal tackles a subtle but critical question in game theory and multi-agent AI: when a zero-sum game has many Nash equilibria (a convex polytope), which one do standard solvers actually find? Using a tabular testbed of six analytically solvable games—including a two-dimensional Nash polytope and Kuhn poker—Leal shows that algorithm choice, not random seed, determines the selection. Regularized last-iterate methods such as R-NaD and magnetic mirror descent consistently pick the maximum-entropy equilibrium, corresponding to the information projection of a uniform reference onto the Nash set. In contrast, regret-averaging algorithms (CFR, CFR+, fictitious play) converge to a lower-entropy face of the polytope.

The findings are backed by a large randomized experiment on 180 games: R-NaD attained the maximum-entropy member in 100% of converged runs, while CFR+ sat strictly below it in 94% (paired Wilcoxon p < 10^-27). The chosen equilibrium matters downstream: against sub-optimal opponents, the max-entropy member acts as a better hedge in sequential or hidden-information games like Kuhn poker. Two negative results correct common intuitions: removing CFR's positive-orthant projection does not eliminate boundary drift, and R-NaD's selection is anchor-following (depends on reference), not initialization-independent. The paper states the maximum-entropy / I-projection characterization as a strongly supported conjecture with analytic ground truth.

Key Points

Regularized last-iterate methods (R-NaD, magnetic mirror descent) select the maximum-entropy Nash equilibrium exactly on a 2-D polytope and at 99.7% of maximum entropy in Kuhn poker.
Regret-averaging methods (CFR, CFR+, fictitious play) drift to lower-entropy faces; in a 180-game ensemble, CFR+ was below max-entropy in 94% of games (p < 10^-27).
The selected equilibrium has real consequences: max-entropy provides a better hedge against sub-optimal opponents in sequential/hidden-information games like Kuhn.

Why It Matters

Your AI's choice of solver secretly picks which Nash equilibrium you'll get—affecting strategy robustness in real-world games.

Read Original Article

New Study: AI Game Solvers Don't All Pick the Same Nash Equilibrium

Why It Matters

Related Articles

🚀 Stay Ahead in AI