Research & Papers

ImProver 2: 7B model outperforms larger LLMs on Lean 4 proof optimization

A tiny 7B model beats 70B+ competitors at restructuring formal proofs in Lean 4.

Deep Dive

ImProver 2, developed by Riyaz Ahuja, Tate Rowney, Jeremy Avigad, and Sean Welleck, is a neurosymbolic framework that automates proof optimization in Lean 4. As formal mathematics libraries explode in size, maintaining and refactoring verified proofs becomes critical for both library quality and training data for neural provers. The framework combines a data-efficient expert-iteration pipeline with a scaffold that exposes formal proof structure alongside lightweight informal abstractions, enabling models to generate structurally improved proofs.

The results are striking: a 7B-parameter model trained with ImProver 2 outperforms models orders of magnitude larger within the same family and is competitive with mid-tier frontier models across multiple structural metrics. The neurosymbolic scaffold also significantly boosts performance when applied to both small and large models. This establishes proof optimization as a scalable, learnable task, showing that even small models can effectively restructure research-level proofs when given proper scaffolding and training.

Key Points
  • ImProver 2 trains a 7B-parameter model that beats much larger models (e.g., 70B+) on proof optimization tasks in Lean 4.
  • Uses expert-iteration pipeline and a scaffold combining formal structure with informal abstractions for data efficiency.
  • Demonstrates that proper scaffolding makes small models competitive with frontier systems on complex proof restructuring.

Why It Matters

Makes formal proof optimization scalable, enabling better libraries and training data for neural theorem provers.