A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks
Research modifies Agentless framework to test how localization granularity impacts AI code repair success.
A new research paper titled "A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks" investigates a crucial but often overlooked aspect of AI-powered programming: how precisely should an AI system locate a bug before trying to fix it? The study, authored by Joseph Townsend, Chandresh Pravin, Kwun Ho Ngan, and Matthieu Parizy, tackles this by modifying the popular Agentless framework—a benchmark for testing AI coding agents. Their key innovation was creating a test environment that assumes perfect fault localization, allowing them to isolate and measure the impact of granularity alone, separate from the AI's ability to find the bug in the first place.
Using the SWE-Bench-Mini dataset, which contains real-world GitHub issues, the team tested three granularity levels: line-level, function-level, and file-level. Their findings show that, as a general rule, function-level granularity yields the highest repair rate. This suggests that giving an AI model like GPT-4 or Claude 3.5 the context of an entire function—rather than just a single line or an entire file—provides the optimal balance of specificity and necessary surrounding code context to generate a correct patch. However, the researchers note the ideal approach may be task-dependent, indicating future systems might dynamically choose granularity.
This study serves as a foundational proof-of-concept rather than a state-of-the-art benchmark. It provides a new framework for the community to systematically test how the fault localization phase interacts with the repair phase in complex, repository-scale coding tasks. The work encourages further research into optimizing the pipeline for AI coding assistants, which could lead to more reliable tools for developers tackling large-scale codebases.
- Modified the Agentless framework to test fault localization granularity in isolation, assuming perfect bug detection.
- Tested on SWE-Bench-Mini dataset, finding function-level context yields the highest repair rate over line-level and file-level.
- Provides a new testing framework for optimizing the pipeline of AI coding agents, separating localization from repair.
Why It Matters
This research provides a blueprint for building more effective AI coding assistants that can reliably fix complex bugs in large codebases.