Research & Papers

SHIFT: Robust Double Machine Learning for Average Dose-Response Functions under Heavy-Tailed Contamination

Stress tests at 25% outlier rate drop RMSE from 1.03 to 0.33 with near-perfect outlier detection.

Deep Dive

Standard double-machine-learning (DML) pipelines for average dose-response functions (ADRF) rely on kernel-weighted local-linear smoothers that are vulnerable to heavy-tailed contamination — a single outlier within a kernel window can bias the entire curve. Eichi Uehara’s new paper introduces SHIFT (Self-calibrated Heavy-tail Inlier-Fit with Tempering), a robust DML estimator that tackles this head-on. The method uses cross-fit nuisance orthogonalization combined with a kernel-local Welsch-loss second stage optimized by Graduated Non-Convexity (GNC). Its key design choice is a defensive OLS refit where the inlier cutoff is scaled by post-GNC residual MAD, not raw-outcome MAD. On a localized-contamination stress test with 25% outliers, this drops level-RMSE from 1.03 to 0.33 while leaving clean and uniformly contaminated results unchanged. Across 1,400 main-sweep fits, SHIFT achieves competitive worst-case shape recovery (RMSE 0.325 at p=0.25). Among the top three methods with RMSE below 0.35, only SHIFT outputs a non-uniform per-sample weight vector that recovers the true outlier mask with mean F1 ≈ 0.96 (range 0.945–0.968) on Gaussian-jump DGPs.

The paper also includes a six-technique Extreme Value Theory diagnostic suite (Hill, GPD-MLE/PWM, GEV, Mean Excess, parameter stability, causal tail coefficient) to help practitioners distinguish Fréchet from Weibull regimes and choose between SHIFT and L1 alternatives empirically. Extensions cover binary-treatment CATE using a Huber pseudo-outcome X-Learner and time-series ADRF via block-CV with rolling MAD. A counter-intuitive ablation reveals that linear nuisance models (Ridge, Lasso) outperform gradient-boosted nuisances for robust DML under uniform contamination — inverting the usual bias-variance heuristic. This work is particularly valuable for causal inference in fields like econometrics, epidemiology, and any domain with messy, heavy-tailed observational data, offering a principled way to maintain estimate integrity without discarding all outliers blindly.

Key Points
  • SHIFT reduces level-RMSE from 1.03 to 0.33 on localized contamination at 25% outlier rate
  • Outlier mask recovery achieves F1 ≈ 0.96 (range 0.945–0.968) on Gaussian-jump DGPs
  • Linear nuisance models beat flexible gradient-boosted ones for robust DML under uniform contamination

Why It Matters

Enables reliable causal inference from heavy-tailed data, critical for policy and medical dose-response studies.