LLM-AutoSciLab uses LLMs to generate competing hypotheses and run active experiments to distinguish them, achieving 67.6% symbolic accuracy on NewtonBench?

LLM-AutoSciLab uses LLMs to generate competing hypotheses and run active experiments to distinguish them, achieving 67.6% symbolic accuracy on NewtonBench.

On ActiveSciBench-Chem (57 enzyme kinetics tasks) it scored 35.1% accuracy; on ActiveSciBench-GRN (45 gene regulatory network tasks) it recovered 31.1% of true graph structure?

On ActiveSciBench-Chem (57 enzyme kinetics tasks) it scored 35.1% accuracy; on ActiveSciBench-GRN (45 gene regulatory network tasks) it recovered 31.1% of true graph structure.

The framework is 2–5x more sample-efficient than prior baselines by focusing experiments on high-information observations?

The framework is 2–5x more sample-efficient than prior baselines by focusing experiments on high-information observations.

Research & Papers

LLM-AutoSciLab uses active experimentation to automate scientific discovery

arXiv cs.LG May 26, 2026

⚡LLMs now design and run their own experiments, beating fixed-dataset methods by 2–5x efficiency.

Deep Dive

Most AI-driven scientific discovery treats the process as supervised learning on fixed datasets—but real science is dynamic: hypotheses guide experiments, and results refine the hypotheses. A new paper from researchers at Virginia Tech and IBM titled “LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs” addresses this gap by building a framework where LLMs actively propose plausible mechanisms, select the most informative experiments to test them, and update their understanding based on outcomes.

On three benchmarks—NewtonBench (physics), ActiveSciBench-Chem (enzyme kinetics, 57 tasks), and ActiveSciBench-GRN (gene regulatory networks, 45 tasks)—LLM-AutoSciLab achieved 67.6%, 35.1%, and 31.1% accuracy respectively, while using 2–5x fewer experimental samples than leading competitors. The key innovation is hypothesis-conditioned experiment selection: instead of passively fitting curves, the LLM-driven agent actively decides which variable to probe next to resolve uncertainty. Code and datasets are open-sourced on GitHub.

Key Points

LLM-AutoSciLab uses LLMs to generate competing hypotheses and run active experiments to distinguish them, achieving 67.6% symbolic accuracy on NewtonBench.
On ActiveSciBench-Chem (57 enzyme kinetics tasks) it scored 35.1% accuracy; on ActiveSciBench-GRN (45 gene regulatory network tasks) it recovered 31.1% of true graph structure.
The framework is 2–5x more sample-efficient than prior baselines by focusing experiments on high-information observations.

Why It Matters

Automated, adaptive experimentation could accelerate drug discovery, genomics, and material science by reducing needed lab trials.

Read Original Article

LLM-AutoSciLab uses active experimentation to automate scientific discovery

Why It Matters

Related Articles

🚀 Stay Ahead in AI