Research & Papers

Learning to Disprove: Formal Counterexample Generation with Large Language Models

arXiv cs.AI March 23, 2026

⚡New AI model learns to disprove false theorems by generating verifiable counterexamples in Lean 4.

Deep Dive

A team of researchers has introduced 'Learning to Disprove,' a novel approach that trains large language models (LLMs) to perform formal counterexample generation—a critical but often neglected skill in mathematical reasoning. While current AI efforts focus heavily on proving true statements, this work addresses the complementary task of disproving false ones. The researchers fine-tune LLMs to not only propose candidate counterexamples but also to produce formal, machine-verifiable proofs within the Lean 4 theorem prover, ensuring rigorous validation.

To enable effective training, the team developed a symbolic mutation strategy that synthesizes diverse datasets by systematically extracting theorems and discarding selected hypotheses, creating numerous counterexample instances. This data generation method is combined with a multi-reward expert iteration framework, which substantially enhances both the effectiveness and efficiency of training. Experiments conducted on three newly collected benchmarks demonstrate that this combined approach yields significant performance gains, validating the advantages of their methodology over existing techniques.

Key Points

Trains LLMs for formal counterexample generation, requiring both candidate proposals and Lean 4-verifiable proofs.
Uses a symbolic mutation strategy to synthesize diverse training data by discarding theorem hypotheses.
Employs a multi-reward expert iteration framework that shows significant performance gains on three new benchmarks.

Why It Matters

This advances AI's capacity for rigorous mathematical reasoning, moving beyond proof generation to critical falsification, with applications in formal verification and automated theorem proving.

Read Original Article

Learning to Disprove: Formal Counterexample Generation with Large Language Models

Why It Matters

Stay Ahead in AI