Developer Tools

AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

New AI framework achieves 300% better recall on elusive 'ghost commits' that evade traditional tools.

Deep Dive

A research team from Singapore Management University and Inria has introduced AgentSZZ, a novel framework that transforms Large Language Models (LLMs) into investigative agents for pinpointing the exact code commits that introduced software bugs. This tackles a core problem in software engineering known as the SZZ algorithm, which traditionally relies on `git blame` and struggles with complex scenarios like 'ghost commits' (where the buggy line no longer exists) and cross-file changes, leaving nearly 25% of bug-inducing commits untraceable. AgentSZZ moves beyond static pipelines by employing an LLM-driven agent that operates in a ReAct-style loop, allowing it to reason, explore the repository, and use specialized tools adaptively, much like a human developer debugging an issue.

The system's performance marks a significant leap over previous methods, including other LLM-based approaches. On three standard datasets, AgentSZZ achieved F1-score improvements of up to 27.2%. Its most striking advances are in the hardest cases: a 300% boost in recall for cross-file bugs and a 60% gain for ghost commits. A key innovation is a structured compression module that trims redundant context from tool outputs, cutting token consumption by over 30% without hurting accuracy. This makes the agent-based investigation both more effective and more efficient. The framework demonstrates that equipping LLM agents with domain-specific tools and knowledge is critical for complex, real-world software engineering tasks beyond simple code generation.

Key Points
  • Outperforms state-of-the-art SZZ algorithms by up to 27.2% in F1-score, with recall gains of 300% on cross-file bugs.
  • Uses an LLM agent in a ReAct loop with task-specific tools to dynamically investigate repositories, overcoming limitations of fixed pipelines.
  • Integrates a compression module that reduces token usage by over 30%, making detailed agent-based code analysis more scalable.

Why It Matters

Enables more accurate defect prediction and root cause analysis, directly improving software reliability and security for development teams.