A Finite Time Analysis of Thompson Sampling for Bayesian Optimization with Preferential Feedback
New proof shows TS for pairwise comparisons performs as well as standard BO...
A new paper from Joseph Lazzaro, Davide Buffelli, Da-shan Shiu, and Sattar Vakili, accepted at AISTATS 2026, presents a finite-time analysis of Thompson Sampling (TS) for Bayesian optimization (BO) with preferential feedback—where the algorithm learns from pairwise comparisons (e.g., "A is better than B") instead of scalar scores. This is increasingly relevant for human-in-the-loop design, laboratory experiments, and scientific discovery where direct numerical feedback is impractical. The method models comparisons using a monotone link function on latent utility differences and leverages a dueling kernel induced by a base kernel.
The key theoretical contribution is a proof that the proposed TS approach achieves the same finite-time regret bounds as standard TS for conventional BO with scalar feedback—a significant step in formalizing the efficiency of preference-based optimization. The analysis exploits the anchor invariance property of TS for challenger selection and introduces a novel double-TS pairing variant. Experimental results on both synthetic benchmarks and real-world applications demonstrate the method's effectiveness, bridging theory and practice for preference-driven optimization tasks.
- First finite-time analysis proving Thompson Sampling for preferential BO matches standard BO performance bounds.
- Uses a monotone link function on latent utility differences and a dueling kernel for pairwise comparisons.
- Introduces a double-TS pairing variant that exploits anchor invariance for challenger selection.
Why It Matters
Enables efficient Bayesian optimization when only pairwise comparisons are available, critical for human-in-the-loop and scientific discovery.