First lower bound on learning speed for mean-based algorithms with unknown horizons?

First lower bound on learning speed for mean-based algorithms with unknown horizons.

generalized ε-greedy and mean-based Exp3 for unknown horizons.

Mean-based algorithms can be both mean-based and no-regret depending on γ_t selection?

Mean-based algorithms can be both mean-based and no-regret depending on γ_t selection.

Research & Papers

New paper reveals limits of mean-based algorithms in bandit learning

arXiv cs.GT June 04, 2026

⚡First lower bound proves how fast these algorithms can learn with bandit feedback.

Deep Dive

A new arXiv paper by Julius Durmann and Amelie Kleber tackles a fundamental question in online learning: how fast can mean-based algorithms learn when the time horizon is unknown and only bandit feedback is available? Mean-based algorithms assign low probability to actions with low average rewards and are known to converge to serially undominated actions, which approximate Nash equilibria in games. However, empirical work had suggested slower convergence compared to classic bandit algorithms like EXP3 or UCB.

The authors provide the first rigorous lower bound on the algorithm-defining sequence γ_t, formally establishing a limit on learning speed. They also propose two new mean-based algorithms: one generalizes ε-greedy to unknown horizons, and the other extends mean-based Exp3 to work without a known time horizon. Experiments demonstrate that while mean-based algorithms are slightly slower, they perform competitively with other bandit-feedback algorithms. Crucially, the paper shows that depending on the choice of γ_t, algorithms can be both mean-based and no-regret, adding important context to earlier claims about their exploitability.

Key Points

First lower bound on learning speed for mean-based algorithms with unknown horizons.
Two new algorithms: generalized ε-greedy and mean-based Exp3 for unknown horizons.
Mean-based algorithms can be both mean-based and no-regret depending on γ_t selection.

Why It Matters

This work refines our understanding of algorithm safety and efficiency in online learning and game theory.

Read Original Article

New paper reveals limits of mean-based algorithms in bandit learning

Why It Matters

Related Articles

🚀 Stay Ahead in AI