GP-CATE delivers calibrated treatment effect estimates in small-placebo trials
Standard X-Learner intervals under-cover; GP-CATE fixes the bias with Gaussian processes.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Estimating how much an intervention helps a specific individual – the conditional average treatment effect (CATE) – is critical in medicine, economics, and A/B testing. But when one treatment arm is much smaller than the other (the few-placebo regime), standard methods produce unreliable uncertainty intervals. The popular X-Learner, when made Bayesian, yields intervals that contain the true effect less often than claimed. The root cause: the regression target inherits bias from a nuisance model fit to the small arm, and doubly-robust corrections fail due to limited overlap.
Uehara introduces GP-CATE, which models each arm's outcome surface with Gaussian processes. This lets uncertainty from the scarce arm flow directly into the posterior, avoiding the bias that plagued earlier approaches. Across synthetic and semi-synthetic benchmarks, GP-CATE achieves calibrated coverage where Causal Forest and BART fall short. The trade-off: intervals are appropriately wider when data is sparse. The method is presented in a 14-page paper on arXiv (2605.27473) with 1 figure and 5 tables.
- Standard X-Learner with Bayesian second stage under-covers because the posterior is centered away from the true effect due to bias from the small-arm nuisance model.
- GP-CATE uses Gaussian processes to model each arm's outcome surface, directly incorporating uncertainty from the scarce arm rather than leaving it as unmodeled bias.
- On benchmarks, GP-CATE achieves calibrated coverage where Causal Forest and BART fail, at the cost of appropriately wider intervals when data is uninformative.
Why It Matters
Better uncertainty quantification in small-sample treatment effects improves decision-making in medicine, policy, and A/B testing.