Research & Papers

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

Offline ad models need 10,000 samples; this online method works from user one.

Deep Dive

A new paper from Abhirami Pillai introduces Budget-Constrained Causal Bandits (BCCB), an online framework that unifies treatment effect learning, exploration, and budget pacing into a single sequential decision process. Unlike traditional two-stage offline pipelines—which collect historical data to estimate heterogeneous treatment effects (HTE) and then solve a constrained optimization—BCCB operates from the very first user. This makes it ideal for cold-start scenarios like new ad campaigns, markets, or customer segments where historical data is scarce.

Evaluated on the Criteo Uplift dataset (a large-scale advertising RCT), BCCB shows that offline methods need roughly 10,000 historical observations to produce reliable results, while BCCB works effectively from the start. It also exhibits 3-5x lower performance variance between runs, making it more dependable for real campaign planning. Among purely online methods, BCCB consistently outperforms standard Thompson Sampling, budgeted Thompson Sampling, and greedy HTE estimation across all tested budget levels.

Key Points
  • BCCB eliminates the need for ~10,000 historical observations required by offline methods
  • Achieves 3-5x lower performance variance between runs for more reliable planning
  • Outperforms standard Thompson Sampling and greedy HTE across all budget levels on Criteo data

Why It Matters

Advertisers can launch campaigns instantly in new markets without waiting for historical data.