Research & Papers

Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

A mathematical dichotomy reveals that sublinear regret bounds that are scale-invariant exist only for one-dimensional regression.

Deep Dive

Self-normalized martingale inequalities are fundamental for confidence ellipsoids in online least squares and underpin many bandit and reinforcement learning results. However, existing bounds typically rely on bounded covariates and explicit regularization, making them not scale-invariant despite the self-normalized quantity itself being scale-invariant. A team of researchers—Fan Chen, Jian Qian, Alexander Rakhlin, and Nikita Zhivotovskiy—set out to characterize when scale-invariant upper bounds are possible. They prove a stark dichotomy: without further assumptions, nontrivial scale-invariant bounds exist only in dimension d=1. In that case, they obtain O(log T) bounds without any covariate assumptions. For d>1, they show that no nontrivial scale-invariant bound can hold in full generality.

This result directly resolves an open question from Gaillard, Gerchinovitz, Huard, and Stoltz (ALT 2019) on doubly-uniform regret for sequential linear regression with square loss. The researchers demonstrate that in d=1, an explicit algorithm achieves O(log T) doubly-uniform regret, while for d>1 sublinear doubly-uniform regret is impossible. However, by introducing a natural smoothness condition (bounded Radon-Nikodym derivatives of conditional covariate laws with respect to a fixed base measure), they recover sublinear regret for d>1 without bounded covariates and derive a self-normalized concentration inequality free of usual regularization penalties. This yields arguably the first natural scale-invariant bound for adaptive, non-i.i.d. vector martingales.

Key Points
  • Proved impossibility of nontrivial scale-invariant self-normalized bounds in dimensions d>1 without extra assumptions.
  • Achieved O(log T) doubly-uniform regret in one dimension, resolving an open problem from 2019.
  • Under a smoothness condition, recovered sublinear regret and a regularization-free concentration inequality for vector martingales.

Why It Matters

This work clarifies fundamental limits of online linear regression and guides design of scale-robust algorithms for bandits and reinforcement learning.