Toward Executable Repository-Level Code Generation via Environment Alignment
New AI system improves functional correctness of generated code by up to 8.66 percentage points.
A team of researchers, including Ruwei Pan and seven others, has introduced EnvGraph, a novel framework designed to solve a major hurdle in AI-powered coding: generating entire, executable software repositories. Current large language models (LLMs) like GPT-4 and Claude 3 excel at creating plausible code snippets but often fail when tasked with building a complete, multi-file project that can be installed, have its dependencies resolved, and run successfully. EnvGraph addresses this by framing the problem as 'environment alignment,' where the AI must ensure both external dependencies (like libraries) and internal references (links between files) are correctly satisfied for the final codebase to work.
EnvGraph operates through a dual-layer environment representation and an iterative alignment loop. It uses execution evidence to attribute errors and guides the generation process with a targeted revision mechanism. The researchers rigorously evaluated the framework on repository-level benchmarks using three different backbone LLMs, comparing it against other environment-aware and repository-level baselines. The results were significant: EnvGraph consistently achieved the top performance, delivering absolute improvements of 5.72 to 5.87 percentage points in Functional Correctness and 4.58 to 8.66 points in Non-Functional Quality over the strongest alternative method. This represents a concrete step toward AI systems that can autonomously build and validate complex software, not just suggest lines of code.
- EnvGraph formulates repository generation as an environment alignment problem, jointly modeling dependency satisfaction and internal reference resolution.
- The framework outperformed the strongest non-EnvGraph baseline by 5.72-5.87 percentage points in Functional Correctness in benchmarks.
- It also improved Non-Functional Quality scores by 4.58-8.66 points, showing gains in code quality beyond just making it run.
Why It Matters
This moves AI coding assistants from generating isolated snippets to building complete, deployable software projects, accelerating development.