batched linear bandits where agent returns only 1 bit per batch; learner designs quantization rule per batch.

Ω(B min{d, log|A|}) regret from communication alone, plus standard statistical terms.

Two algorithms achieve Õ(dB + d√T) and Õ(B log|A| + d^{3/2}√B + √(dT log|A|)) regret, nearly matching lower bounds?

Two algorithms achieve Õ(dB + d√T) and Õ(B log|A| + d^{3/2}√B + √(dT log|A|)) regret, nearly matching lower bounds.

Research & Papers

1-bit per batch is enough: new bandit algorithms near optimal regret

arXiv stat.ML June 01, 2026

⚡Linear bandits with just one bit of feedback per batch match unconstrained performance.

Deep Dive

A team of researchers studied stochastic linear bandits under a combination of batching and communication constraints: the time horizon is split into batches of equal size B, and during each batch the learner sends B arm pulls, then the agent responds with a single bit of feedback. They proved a minimax lower bound and designed two phased-elimination algorithms that achieve regret within logarithmic factors of the unconstrained setting—even for batch sizes as large as Θ(√T). This shows a single bit of feedback per batch suffices to nearly match the minimax regret of unconstrained linear bandits in broad scaling regimes.

Key Points

Setting: batched linear bandits where agent returns only 1 bit per batch; learner designs quantization rule per batch.
Minimax lower bound: Ω(B min{d, log|A|}) regret from communication alone, plus standard statistical terms.
Two algorithms achieve Õ(dB + d√T) and Õ(B log|A| + d^{3/2}√B + √(dT log|A|)) regret, nearly matching lower bounds.

Why It Matters

Enables near-optimal learning with extreme communication constraints, key for distributed sensing and low-power AI agents.

Read Original Article

1-bit per batch is enough: new bandit algorithms near optimal regret

Why It Matters

Related Articles

🚀 Stay Ahead in AI