Debug2Fix: Supercharging Coding Agents with Interactive Debugging Capabilities
New research shows adding debuggers to AI coding agents makes weaker models outperform stronger ones.
Researchers Spandan Garg and Yufan Huang have introduced Debug2Fix, a groundbreaking framework that integrates interactive debugging capabilities directly into AI coding agents. Published on arXiv, the work addresses a critical gap in current coding assistants: while developers routinely use debuggers to examine runtime behavior, most AI coding agents rely on static analysis or trial-and-error test-fix cycles, missing crucial runtime information.
The Debug2Fix framework employs a subagent architecture that incorporates debuggers for Java and Python, allowing AI agents to step through code execution, inspect variables, and understand program state dynamically. When evaluated against GitBug-Java and SWE-Bench-Live benchmarks, the system demonstrated >20% performance improvements over baseline approaches. Perhaps most compellingly, the research showed that with Debug2Fix, less capable models like GPT-5 and Claude Haiku 4.5 could match or even exceed the performance of more sophisticated models like Claude Sonnet 4.5, suggesting that tool design improvements may offer better returns than simply upgrading to more expensive AI models.
This work represents a significant shift in how we think about AI-assisted software development. By giving coding agents access to the same debugging tools developers use daily, Debug2Fix moves beyond simple code generation toward true software engineering assistance. The framework's success indicates that future coding agents will likely incorporate more sophisticated development tools, potentially transforming how software is written, debugged, and maintained. As AI coding assistants become more integrated into development workflows, capabilities like interactive debugging will be essential for handling complex, real-world software engineering tasks.
- Debug2Fix framework integrates interactive debuggers for Java and Python into AI coding agents via subagent architecture
- Achieved >20% performance improvements on GitBug-Java and SWE-Bench-Live benchmarks compared to baseline approaches
- Enabled weaker models (GPT-5, Claude Haiku 4.5) to match or exceed stronger models (Claude Sonnet 4.5) through better tool design
Why It Matters
Better tool design can make affordable AI models outperform expensive ones, democratizing access to high-quality coding assistance.