Bengio's Scientist AI critiqued: alignment and causal inference flaws exposed
Yoshua Bengio's non-agentic AI plan may be impossible and unsafe, argues new analysis.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Yoshua Bengio's proposed 'Scientist AI'—a non-agentic, tool-like AI for scientific discovery—faces serious theoretical and practical objections, according to a new analysis by Matthew Khoriaty on LessWrong. Khoriaty argues Bengio's plan overlooks alignment failures: a Scientist AI tasked with curing cancer could logically output steps that involve creating an agentic AI, reintroducing the very risks Bengio seeks to avoid. Moreover, Bengio's method relies on training the AI on associative conditional probabilities, yet causal inference (as pioneered by Judea Pearl) requires taking actions to distinguish correlation from causation. Without such actions, the AI cannot deduce true causal models, making genuine scientific advancement impossible. Bengio also proposes constructing a formal language to map to reality—a decades-old unsolved problem—and training without human data risks missing malicious patterns in pre-training.
Despite these critiques, Khoriaty acknowledges positive aspects: Bengio's short-term plan—fine-tuning LLMs to hypothesize what could go wrong with a user's request—is practical and can improve system safety. He also praises the framework for characterizing risky agentic AI by affordances, goal-directedness, and intelligence, and appreciates the concept of 'anytime preparedness'. However, Khoriaty concludes that the core Scientist AI vision is not doable, likely already dismissed by other AI safety researchers. He respects Bengio's contributions but expects the group's valuable outputs to come from narrower, more feasible approaches, not the grand plan as written.
- Bengio's Scientist AI could produce agentic sub-steps (e.g., 'make an agent to cure cancer'), creating alignment risks.
- Causal discovery requires taking actions, but Bengio's plan relies on associative probabilities, making scientific inference impossible.
- Khoriaty endorses the short-term goal of fine-tuning LLMs to hypothesize request risks for practical safety gains.
Why It Matters
This critique challenges a high-profile AI safety proposal, underscoring that non-agentic AIs may still require agency for real science.