Gaba & Guha solve MDP verification: P for fixed windows, NP∩coNP for bounded
New complexity results for window mean payoff could tighten AI safety guarantees.
A new paper by Pranshu Gaba and Shibashis Guha tackles two fundamental verification problems in Markov decision processes (MDPs) using window mean-payoff objectives. The window mean payoff strengthens the classic mean payoff by requiring that the average over any sliding finite window exceeds a threshold. The authors define two problems: sure-almost-sure (must guarantee minimum payoff α for all outcomes and almost-surely achieve β) and sure-limit-sure (for any ε, guarantee α and with probability 1-ε achieve β).
They prove that for fixed window length ℓ (given in unary), both problems are solvable in polynomial time (P). For the bounded variant where the window length is not fixed but must be bounded along the run, they show the problems lie in NP ∩ coNP — matching the complexity of separate sure and almost-sure satisfaction. The work also establishes tight memory bounds for the strategies, an essential factor for deploying verified controllers in robotics or finance. This paper, posted on arXiv with 41 pages and 10 figures, contributes to the intersection of computer science, game theory, and probability, with direct implications for ensuring reliable performance in AI systems.
- Sure-almost-sure and sure-limit-sure problems solved for window mean-payoff MDPs
- Fixed window length variant is in P; bounded variant is in NP ∩ coNP
- Memory bounds for winning strategies derived for both variants
Why It Matters
Helps formally verify reward guarantees in reinforcement learning and autonomous decision-making systems, a key step toward safety.