Research & Papers

Regret Analysis of Sleeping Competing Bandits

New algorithm tackles real-world chaos where agents and options randomly go 'offline'.

Deep Dive

Researchers Shinnosuke Uba and Yutaro Yamaguchi have published a significant theoretical advance in online learning with their paper 'Regret Analysis of Sleeping Competing Bandits.' The work extends the established Competing Bandits framework, which integrates multi-armed bandit algorithms with game-theoretic stable matching, to a far more realistic 'sleeping' scenario. In real-world systems—from online job matching to dynamic ad auctions—not all 'players' (like job seekers) or 'arms' (like job openings) are constantly available; they can appear and disappear arbitrarily over time. This paper formally models this unpredictable availability, a major step toward practical application.

To analyze this complex problem, the authors first had to extend the definition of 'regret'—the metric of performance loss compared to an ideal matchmaker. They then designed an algorithm that, under reasonable assumptions, achieves an asymptotic upper bound on regret of O(NK log T_i / Δ²), where N is the number of players, K the number of arms, T_i the rounds per player, and Δ the minimum reward gap. Crucially, they also proved a matching lower bound of Ω(N(K-N+1) log T_i / Δ²). This dual result demonstrates their algorithm is asymptotically optimal, particularly in the common practical regime where there are more options (K) than decision-makers (N). The work provides a rigorous mathematical backbone for building more robust and efficient real-time matching systems that don't break when participants drop in and out.

Key Points
  • Extends Competing Bandits to a 'sleeping' model where player/arm availability changes arbitrarily, mirroring real-world dynamic systems.
  • Proposes an algorithm with a proven regret upper bound of O(NK log T/Δ²) and a matching lower bound, establishing asymptotic optimality.
  • The optimality guarantee holds specifically when the number of arms (K) is larger than the number of players (N), a common practical scenario.

Why It Matters

Provides a theoretical foundation for robust, real-time AI systems in dynamic markets like job platforms, ride-sharing, and ad auctions where availability is unpredictable.