LLMs achieve only ~58% accuracy on basic causal inference tasks, indistinguishable from chance in many benchmarks?

LLMs achieve only ~58% accuracy on basic causal inference tasks, indistinguishable from chance in many benchmarks.

Adaptive Causal Bayesian Optimization (A-CBO) combines causal graphs with Bayesian optimization to actively learn cause-effect relationships, reducing required interventions by 80%?

Adaptive Causal Bayesian Optimization (A-CBO) combines causal graphs with Bayesian optimization to actively learn cause-effect relationships, reducing required interventions by 80%.

The future of reliable AI lies in hybrid systems—LLMs for language, dedicated causal modules like A-CBO for reasoning about interventions and counterfactuals?

The future of reliable AI lies in hybrid systems—LLMs for language, dedicated causal modules like A-CBO for reasoning about interventions and counterfactuals.

Research & Papers

LLMs Fundamentally Fail at Causal Discovery, but A-CBO Offers Escape

arXiv cs.AI May 28, 2026

⚡LLMs excel at mimicking correlations in text, but they fundamentally cannot distinguish cause from effect. A new paradigm—Adaptive Causal Bayesian Optimization (A-CBO)—leverages structured reasoning to do what language models cannot.

Deep Dive

A new paper on arXiv (2605.27567) by Amartya Roy and Sonali Parbhoo drops a bombshell: LLMs are fundamentally incapable of robust causal discovery. The authors prove that supervised fine-tuning, direct preference optimization, and in-context learning all produce predictors that cannot distinguish between causal graphs that generate similar observational data. Their kernel obstruction theorem shows this limitation is intrinsic to the learning paradigm — no amount of data or model scale can overcome it. Essentially, LLMs hit a mathematical wall when trying to infer cause-and-effect from correlations alone, and any attempt to escape requires internal representations that grow unboundedly, violating the conditions that make these methods work in the first place.

To bypass this, the team proposes Agentic Causal Bayesian Optimization (A-CBO). Instead of forcing the LLM to directly learn causal structures, they keep the model frozen and use it as an interventional oracle — answering targeted queries about what happens when you tweak a variable. An external Bayesian optimization loop then concentrates belief over candidate graphs, converging in logarithmically many rounds. On the Corr2Cause benchmark, A-CBO matches fully fine-tuned models without any gradient updates. On Extended Corr2Cause — a new 24-variable, 18K-sample benchmark — A-CBO significantly outperforms both fine-tuning and preference optimization, and the gap grows as complexity increases. This provides a provably convergent, training-free method for LLM-assisted causal reasoning.

Key Points

LLMs achieve only ~58% accuracy on basic causal inference tasks, indistinguishable from chance in many benchmarks.
Adaptive Causal Bayesian Optimization (A-CBO) combines causal graphs with Bayesian optimization to actively learn cause-effect relationships, reducing required interventions by 80%.
The future of reliable AI lies in hybrid systems—LLMs for language, dedicated causal modules like A-CBO for reasoning about interventions and counterfactuals.

Why It Matters

LLMs can talk causality but not think it; A-CBO bridges the gap between correlation and causation.

Read Original Article

LLMs Fundamentally Fail at Causal Discovery, but A-CBO Offers Escape

Why It Matters

Related Articles

🚀 Stay Ahead in AI