A single algorithm for both restless and rested rotting bandits
A single algorithm now tackles both types of decaying rewards...
A team of researchers from Inria and Google has developed RAW-UCB (Rotting Adaptive Window UCB), a novel algorithm that solves a longstanding challenge in bandit problems: handling both restless and rested rotting settings with a single approach. In restless bandits, rewards decay due to external factors (like content becoming outdated), while in rested bandits, decay comes from repeated selection (like user boredom). Levine et al. (2017) had shown that state-of-the-art restless algorithms fail in rested settings, implying these problems were fundamentally different.
RAW-UCB achieves near-optimal regret in both settings without any prior knowledge of the environment type or the nature of non-stationarity (e.g., piece-wise constant or bounded variation). This is a striking contrast to previous negative results that showed no algorithm could achieve similar performance when rewards are allowed to increase. The algorithm's effectiveness was confirmed through synthetic and dataset-based experiments, and the work was published in AISTATS 2020.
- RAW-UCB achieves near-optimal regret in both restless and rested rotting bandits with no prior knowledge of the setting or non-stationarity type.
- Contradicts Levine et al. (2017) who showed no algorithm could handle both settings effectively.
- Tested on synthetic and real datasets; published in AISTATS 2020.
Why It Matters
A unified algorithm for decaying rewards simplifies recommender systems and tutoring platforms, reducing deployment complexity.