Sure-almost-sure and sure-limit-sure problems solved for window mean-payoff MDPs?

Sure-almost-sure and sure-limit-sure problems solved for window mean-payoff MDPs

Fixed window length variant is in P; bounded variant is in NP ∩ coNP?

Fixed window length variant is in P; bounded variant is in NP ∩ coNP

Memory bounds for winning strategies derived for both variants?

Memory bounds for winning strategies derived for both variants

Research & Papers

Gaba & Guha solve MDP verification: P for fixed windows, NP∩coNP for bounded

arXiv cs.GT May 13, 2026

⚡New complexity results for window mean payoff could tighten AI safety guarantees.

Deep Dive

A new paper by Pranshu Gaba and Shibashis Guha tackles two fundamental verification problems in Markov decision processes (MDPs) using window mean-payoff objectives. The window mean payoff strengthens the classic mean payoff by requiring that the average over any sliding finite window exceeds a threshold. The authors define two problems: sure-almost-sure (must guarantee minimum payoff α for all outcomes and almost-surely achieve β) and sure-limit-sure (for any ε, guarantee α and with probability 1-ε achieve β).

They prove that for fixed window length ℓ (given in unary), both problems are solvable in polynomial time (P). For the bounded variant where the window length is not fixed but must be bounded along the run, they show the problems lie in NP ∩ coNP — matching the complexity of separate sure and almost-sure satisfaction. The work also establishes tight memory bounds for the strategies, an essential factor for deploying verified controllers in robotics or finance. This paper, posted on arXiv with 41 pages and 10 figures, contributes to the intersection of computer science, game theory, and probability, with direct implications for ensuring reliable performance in AI systems.

Key Points

Sure-almost-sure and sure-limit-sure problems solved for window mean-payoff MDPs
Fixed window length variant is in P; bounded variant is in NP ∩ coNP
Memory bounds for winning strategies derived for both variants

Why It Matters

Helps formally verify reward guarantees in reinforcement learning and autonomous decision-making systems, a key step toward safety.

Read Original Article

Gaba & Guha solve MDP verification: P for fixed windows, NP∩coNP for bounded

Why It Matters

Related Articles

🚀 Stay Ahead in AI