Research & Papers

New Lipschitz bandit algorithm beats zooming bounds with instance-dependent regret

Potfer and Perchet improve regret rates by leveraging suboptimality gap integrals over level sets.

Deep Dive

In the classic Lipschitz bandit problem, a learner sequentially queries an unknown Lipschitz function f over a domain [0,1]^d and aims to maximize cumulative reward. Existing regret bounds are either worst-case (scaling as T^{(d+1)/(d+2)}) or adaptive via the zooming dimension d_z (scaling as T^{(d_z+1)/(d_z+2)}). However, zooming-based guarantees are only partially instance-dependent — they only capture asymptotic level-set growth, not finer local structure.

Potfer and Perchet introduce a new algorithm that achieves regret bounds expressed as integrals of the suboptimality gap over level sets. This allows the algorithm to adapt to local growth rates rather than just asymptotic behavior. As a key corollary, when the set of maximizers has dimension d*>0, the regret improves to O(T^{(d_z+1)/max(d_z,d*)+2}) — strictly better than classical zooming bounds. The authors also extend their analysis to the full-information setting (Lipschitz experts) and show that some regularity assumptions can be relaxed. This work provides a more nuanced theoretical understanding of bandit optimization and opens the door to better practical algorithms for high-dimensional continuous decision spaces.

Key Points
  • New regret bound scales as T^{(d_z+1)/max(d_z,d*)+2}, improving over classical zooming bound T^{(d_z+1)/(d_z+2)} when maximizer set dimension d* > 0.
  • Algorithm uses integrals of the suboptimality gap over level sets to capture local growth, not just asymptotic behavior.
  • Analysis extends to full-information (Lipschitz experts) setting and relaxes some regularity assumptions on the function.

Why It Matters

Tighter regret bounds mean faster convergence in high-dimensional bandit problems, impacting hyperparameter tuning and adaptive experimentation.