Research & Papers

What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

arXiv stat.ML March 11, 2026

⚡New algorithm solves when AI recommendations differ from actual treatments, providing valid intervals and policy identification.

Deep Dive

Researcher Nicolás Della Penna has introduced BRACE (Bandits with Recommendations, Abstention, and Certified Effects), a novel algorithm addressing a fundamental problem in AI decision-making: what happens when there's a gap between what an AI system recommends and what treatment actually gets delivered to users? This "noncompliance" scenario is common in real-world applications like healthcare recommendations, financial advice platforms, and content moderation systems where downstream actors (doctors, advisors, moderators) may override or modify AI suggestions based on private information.

BRACE is a parameter-free phase-doubling algorithm designed specifically for finite-context square-IV problems. The algorithm's key innovation is that it performs instrumental variable (IV) inversion only after matrix certification, otherwise returning full-range but honest structural intervals. This approach delivers three crucial guarantees: simultaneous policy-value validity, fixed-gap identification of the operationally optimal recommendation policy, and fixed-gap identification of the structurally optimal treatment policy under contextual homogeneity and invertibility conditions.

The research formalizes the objective-choice problem that platforms face—whether to optimize for recommendation welfare in current mediated workflows, treatment learning for future direct-control regimes, or anytime-valid uncertainty quantification. The paper demonstrates through examples that recommendation welfare can actually exceed every learner-measurable treatment policy when downstream actors leverage private information, challenging conventional wisdom about always optimizing for direct treatment effects.

Experimental benchmarks across five challenging scenarios show BRACE's practical value: safety manifests as regret on easy problems, as abstention and wide valid intervals under weak identification, as a reason to prefer recommendation welfare under homogeneity failure, and as tighter structural uncertainty when extra instruments are available. For rich-context problems, the research also derives an orthogonal score whose conditional bias factorizes into compliance-model and outcome-model errors, clarifying what must be stabilized for anytime-valid semiparametric IV inference.

Key Points

BRACE algorithm solves bandit problems where AI recommendations differ from actual treatments (noncompliance)
Provides simultaneous policy-value validity and fixed-gap identification of optimal recommendation/treatment policies
Experimental benchmarks show safety through abstention, valid intervals, and preference for recommendation welfare under certain conditions

Why It Matters

Enables safer AI decision systems in healthcare, finance, and content moderation where human-AI collaboration creates compliance gaps.

Read Original Article

What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

Why It Matters

Stay Ahead in AI