DebugHarness: Emulating Human Dynamic Debugging for Autonomous Program Repair
New LLM agent fixes complex memory vulnerabilities 30% better than static methods by querying live runtime.
A research team from Nanjing University has unveiled DebugHarness, a novel AI agent designed to autonomously repair complex, low-level security vulnerabilities like use-after-free and memory corruption in C/C++ software. The system addresses a critical bottleneck in software engineering: while fuzzers can find bugs, fixing deep-rooted, memory-related flaws still demands expert manual analysis. Current LLM-based repair tools treat bug fixing as a static code-generation task, missing the crucial dynamic execution context needed for accurate diagnosis.
DebugHarness overcomes this by emulating the interactive, investigative workflow of a human systems engineer. When triggered by a reproducible crash, the agent doesn't just look at static code. Instead, it formulates hypotheses and then actively probes the live program's memory states and execution paths. It operates in a closed-loop validation cycle, synthesizing patches and testing them against the runtime environment. This dynamic debugging approach proved dramatically more effective than static methods.
Evaluated on SEC-bench, a rigorous dataset of real-world C/C++ security bugs, DebugHarness successfully patched approximately 90% of the vulnerabilities. This performance represents a relative improvement of over 30% compared to state-of-the-art baseline models. The results demonstrate that integrating dynamic, context-aware investigation significantly enhances an LLM's diagnostic and repair capabilities for intricate systems-level problems.
The research establishes a new paradigm for automated program repair, effectively bridging the gap between the static reasoning of large language models and the dynamic, state-dependent intricacies of low-level systems programming. It moves AI beyond simple code generation and into the realm of interactive, environment-aware problem-solving, promising to reduce the manual burden on security engineers and accelerate the patching of critical software flaws.
- Patches ~90% of evaluated C/C++ security bugs on the SEC-bench dataset, a 30%+ improvement over top baselines.
- Emulates human debugging by actively querying live runtime memory states and execution paths, not just analyzing static code.
- Targets severe, low-level vulnerabilities like use-after-free and memory corruption that typically require expert manual analysis.
Why It Matters
Automates the repair of complex security flaws, potentially drastically reducing patching time and expert workload for critical software.