On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization
A new theoretical paper proves that even minimal changes in a bandit's environment make optimal regret scale with time.
A new theoretical paper from researchers Yixuan Zhang, Ruihao Zhu, and Qiaomin Xie tackles a core problem in online learning: how well can an algorithm perform when the world it's learning from changes, even slightly? The work focuses on the 'satisficing regret' framework for nonstationary multi-armed bandits, a model used everywhere from clinical trials to online advertising. Satisficing regret measures the cost of not finding a 'good enough' arm, rather than the absolute best one. The authors' key, surprising result is that the presence of any nonstationarity—modeled as just two or more stationary segments (L≥2) over time T—forces the optimal regret to scale logarithmically with time, specifically as Θ(L log T).
This stands in stark contrast to the perfectly stationary setting (L=1), where prior work showed a constant, time-independent regret (Θ(1)) is achievable under realizability assumptions. In essence, the paper proves that 'even a little' change breaks the possibility of constant regret, establishing a fundamental lower bound. To prove this, the authors developed a novel 'Fano-based' analytical framework using a 'post-interaction reference' construction, extending classical statistical methods to interactive, nonstationary settings. As a complement, they also identify a special regime where constant regret remains possible, delineating the precise boundary of this phenomenon. The finding has significant implications for the theoretical understanding of adaptive systems and the design of robust online learning algorithms.
- Proves optimal satisficing regret for nonstationary bandits is Θ(L log T) for L≥2 segments, forcing dependence on time T.
- Highlights a sharp phase change: constant regret (Θ(1)) is possible only in perfectly stationary environments (L=1).
- Introduces a novel Fano-based analytical framework tailored for nonstationary, interactive learning problems.
Why It Matters
Sets a fundamental performance limit for AI systems that must learn and adapt in environments that change over time.