New quantum regret algorithm guarantees channel-proof learning in games
Coherent swap regret reaches O(√dT log d) via entropic mirror ascent on CPTP Choi slice.
Sohail Sarkar's paper presents a new regret framework for quantum games, addressing a fundamental gap in how stability is measured when players can apply local completely positive trace-preserving (CPTP) maps. Traditional external regret only considers fixed alternative strategies, but in quantum settings, players can physically transform the state they receive. The proposed coherent swap regret captures all such local CPTP deviations, and the author provides an algorithm achieving O(√dT log d) regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule.
The analysis reveals a three-level hierarchy: replacement channels recover ordinary external regret, unital channels (including unitary deviations) have zero minimax regret, and deterministic measurement-preparation channels force Ω(√dT log d) regret — meaning the difficulty comes from non-unital use of the recommendation register. As an application, decentralized full-information learning in finite quantum games reaches an ε-approximate separable quantum correlated equilibrium after T=O(max_i d_i log d_i / ε²) rounds. The paper also provides an SDP audit for local CPTP exploitability and a probing-bandit extension with pseudo-regret O(d^{4/3}T^{2/3}(log d)^{1/3}).
- Coherent swap regret measures stability against any local CPTP map, not just fixed alternatives.
- Algorithm achieves O(√dT log d) regret using entropic mirror ascent on the Choi slice.
- Decentralized learning reaches ε-approximate separable quantum correlated equilibrium in O(d log d / ε²) rounds.
Why It Matters
Enables provably stable multi-agent quantum learning and provides audit tools for quantum recommendation protocols.