Research & Papers

1-bit per batch is enough: new bandit algorithms near optimal regret

Linear bandits with just one bit of feedback per batch match unconstrained performance.

Deep Dive

A team of researchers studied stochastic linear bandits under a combination of batching and communication constraints: the time horizon is split into batches of equal size B, and during each batch the learner sends B arm pulls, then the agent responds with a single bit of feedback. They proved a minimax lower bound and designed two phased-elimination algorithms that achieve regret within logarithmic factors of the unconstrained setting—even for batch sizes as large as Θ(√T). This shows a single bit of feedback per batch suffices to nearly match the minimax regret of unconstrained linear bandits in broad scaling regimes.

Key Points
  • Setting: batched linear bandits where agent returns only 1 bit per batch; learner designs quantization rule per batch.
  • Minimax lower bound: Ω(B min{d, log|A|}) regret from communication alone, plus standard statistical terms.
  • Two algorithms achieve Õ(dB + d√T) and Õ(B log|A| + d^{3/2}√B + √(dT log|A|)) regret, nearly matching lower bounds.

Why It Matters

Enables near-optimal learning with extreme communication constraints, key for distributed sensing and low-power AI agents.