Research & Papers

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model

arXiv stat.ML April 20, 2026

⚡New paper reveals SSP problems with zero-cost actions may be unlearnable, requiring Ω(SAB⋆³/(c_minε²)) samples.

Deep Dive

A team of researchers including Jean Tarbouriech, Matteo Pirotta, Michal Valko, and Alessandro Lazaric has published groundbreaking work on the sample complexity of learning Stochastic Shortest Path (SSP) problems, a fundamental reinforcement learning framework. Their paper, accepted at ALT 2021, establishes that learning an ε-optimal policy in SSPs requires at least Ω(SAB⋆³/(c_minε²)) samples from a generative model, where S is states, A is actions, c_min is minimum cost, and B⋆ is the optimal policy's maximum expected cost. This provides the first tight characterization of SSP learning difficulty.

The research reveals a surprising theoretical limitation: when the minimum cost c_min equals zero, SSP problems may become fundamentally unlearnable. This distinguishes SSP learning from finite-horizon and discounted settings, where zero-cost transitions don't create the same theoretical barrier. The team complemented their lower bound with matching algorithms—one achieving the bound up to logarithmic factors for general cases, and another specialized algorithm that works even when c_min=0, provided the optimal policy has bounded hitting time to the goal state.

This work establishes SSP as a distinct complexity class in reinforcement learning theory, with implications for how researchers approach planning and learning in stochastic environments. The findings suggest that practical SSP implementations must carefully consider cost structures and may need to impose minimum costs to ensure learnability. The paper provides both fundamental limits and constructive algorithms, offering a complete theoretical picture of what makes SSP problems tractable or intractable.

Key Points

Proved lower bound of Ω(SAB⋆³/(c_minε²)) samples needed for ε-optimal SSP policies with generative model access
Revealed SSP problems with zero minimum cost (c_min=0) may be fundamentally unlearnable, unlike finite-horizon/discounted RL
Provided matching algorithms achieving the bound up to logarithmic factors, including specialized algorithm for c_min=0 cases with bounded hitting time

Why It Matters

Establishes fundamental limits for reinforcement learning in stochastic environments, guiding development of provably efficient AI planning algorithms.

Read Original Article

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model

Why It Matters

Stay Ahead in AI