LLMs verify code via natural language specs, bypassing formal languages
Researchers show LLMs can generate and verify code specs in plain English, not rigid formal languages.
Traditional formal verification is powerful but impractical for most developers because it requires specifications written in rigid formal languages (e.g., TLA+, Z3). Prior attempts to use LLMs to automatically synthesize such formal specs have had limited success. A new paper from researchers Zhaorui Li and Chengyu Song (arXiv:2605.11315) flips the script: let LLMs both generate and verify implementations when the specifications themselves are expressed in natural language. The key insight is that LLMs can reason compositionally about natural language constraints—e.g., 'the function must never dereference a null pointer'—and then check that generated code satisfies those constraints without a formal encoding step. Their preliminary experiments suggest this approach is promising for catching security vulnerabilities in LLM-generated code.
The implications for software engineering and AI safety are significant. As LLM-generated code becomes more common, the ability to verify it against user-defined natural language requirements could become a critical safety layer. The approach doesn't require developers to learn formal verification tools; they simply describe what the code should (or shouldn't) do in plain English. The paper also touches on cryptographic security (cs.CR) applications, hinting at use cases like verifying that AI-written patches don't introduce backdoors. If scaled, this method could dramatically lower the barrier to proving code correctness, especially for the millions of developers now using AI coding assistants.
- LLMs generate and verify code specifications in natural language, avoiding formal languages like TLA+ or Z3.
- Preliminary results show promise for identifying security vulnerabilities in LLM-generated code.
- Paper (arXiv:2605.11315) by Li and Song spans Software Engineering, AI, and Cryptography/Security.
Why It Matters
Makes formal verification accessible to non-experts, reducing AI-generated code vulnerabilities.