Research & Papers

Sound Agentic Science Requires Adversarial Experiments

LLM agents accelerate discovery, but also plausible, endlessly revisable false claims.

Deep Dive

A new paper presented at the ICLR 2026 Workshop on Agents in the Wild, titled "Sound Agentic Science Requires Adversarial Experiments" by Dionizije Fa and Marko Culjak, warns that LLM-based agents are being rapidly adopted for scientific data analysis, automating tasks once limited by human time. While this is often framed as an acceleration of discovery, the authors argue it also accelerates a familiar failure mode: the rapid production of plausible, endlessly revisable analyses that are easy to generate. This effectively turns hypothesis space into candidate claims supported by selectively chosen analyses, optimized for publishable positives.

The authors emphasize that scientific knowledge is not validated by the iterative accumulation of code and post hoc statistical support. A fluent explanation or a significant result on a single dataset is not verification. Because the missing evidence is a negative space—experiments and analyses that would have falsified the claim were never run or never published—the paper proposes a falsification-first standard: agents should not be used primarily to craft the most compelling narrative, but to actively search for the ways in which the claim can fail. This adversarial approach aims to safeguard against the proliferation of false positives in AI-driven research.

Key Points
  • LLM agents can produce plausible but false scientific claims by selectively choosing analyses.
  • The paper proposes a "falsification-first" standard where agents actively seek ways a claim can fail.
  • Published at ICLR 2026 Workshop on Agents in the Wild, challenging current AI deployment in research.

Why It Matters

This could reshape how AI agents are deployed in science, prioritizing falsification over narrative generation.