Developer Tools

Leanstral: Open-source agent for trustworthy coding and formal proof engineering

Open-source 6B-parameter agent for Lean 4 formal proofs costs $36 vs. Sonnet's $549, beating it by 2.6 points.

Deep Dive

Mistral AI has released Leanstral, a groundbreaking open-source AI agent specifically engineered for generating formally verified code using the Lean 4 proof assistant. Unlike generalist coding models, Leanstral is trained to not only write code but also produce the mathematical proofs that guarantee its correctness against strict specifications. This addresses a critical bottleneck in high-stakes domains like frontier mathematics and mission-critical software, where human review of AI-generated logic is slow and expertise-intensive. The model uses a highly sparse architecture with 6B active parameters and is released under an Apache 2.0 license, available via Mistral's Vibe agent platform and a free API.

Performance is measured by the new FLTEval benchmark, which tests an agent's ability to complete proofs and define concepts within real formal repositories, moving beyond isolated math problems. Leanstral demonstrates remarkable efficiency, achieving a score of 26.3 with just two attempts (pass@2) at a cost of $36. This beats Anthropic's Sonnet 4.6, which scores 23.7 for $549. While Claude Opus 4.6 remains the quality leader with a score of 39.6, it costs a staggering $1,650—92 times more than a basic Leanstral run. The model scales linearly with more attempts, reaching a score of 31.9 at pass@16 for $290, comfortably outperforming Sonnet by 8 points.

Leanstral was also tested on practical, real-world scenarios like answering Stack Exchange questions about breaking changes in new Lean versions, a task it handled adeptly despite not being trained on the latest release candidate. The agent supports arbitrary Model Context Protocols (MCPs) through Vibe and is optimized for the lean-lsp-mcp. By providing a cost-effective, open-source path to formally verified code, Mistral is democratizing access to trustworthy AI-assisted development for research and critical software engineering.

Key Points
  • Leanstral pass@2 scores 26.3 on FLTEval for $36, beating Sonnet 4.6's 23.7 score which costs $549.
  • The model uses a sparse 6B-parameter architecture trained for Lean 4 and is fully open-source under Apache 2.0.
  • It serves as a 92x cheaper alternative to Claude Opus 4.6 for generating provably correct code in high-stakes domains.

Why It Matters

Dramatically lowers the cost and expertise barrier for generating formally verified, trustworthy code in research and critical software.