ICML paper warns: LLMs can't reliably discover scientific mechanisms
Predictive success ≠ understanding – why LLMs fool science.
Tyler H. McCormick’s position paper (ICML 2026) challenges the growing use of LLMs and complex AI models for scientific hypothesis generation. The core argument: when models perform well on high-dimensional proxy data, multiple competing mechanisms can explain the same observations equally well—making predictive success an unreliable indicator of true mechanistic understanding. LLMs worsen this by merging diverse explanations into a single, coherent-seeming narrative, giving false confidence in discovered 'mechanisms.'
The paper proposes concrete norms for a new field of 'mechanistic ML,' insisting that scientific discovery workflows must prioritize identifying underlying structure (e.g., causal frameworks or invariants) rather than fitting increasingly complex models. This is especially critical as LLM-based systems are deployed in high-stakes domains like biology, economics, and social science. McCormick argues that without these standards, AI will simulate science rather than advance it.
- Mechanistic learning is generically underdetermined: many incompatible mechanisms match the same observational data.
- LLMs collapse equivalence classes of explanations into a single fluent narrative, increasing risk of false discoveries.
- McCormick proposes concrete standards for 'mechanistic ML' to ensure LLM workflows support genuine scientific progress.
Why It Matters
Without structural priors, AI-driven science risks replacing real discovery with compelling fiction.