Developer Tools

BIC-Hunter model improves bug-commit detection by 6-7% with confident learning

New AI model slashes noisy labels and boosts recall by over 32% in bug-finding

Deep Dive

Just-In-Time defect prediction is critical for software quality, but real-world data often contains noisy labels and insufficient semantic context. To tackle these challenges, Weihao Sun and Qiyun Zhao introduce BIC-Hunter (Bug-Inducing Commits Hunter), a novel model that combines a data denoising component with a semantic relationship capturing module. The denoising component uses confident learning to filter inaccurate annotations and inconsistencies, improving training data reliability. The semantic component constructs homogeneous graphs and applies graph convolutional networks to analyze code context more comprehensively, enabling precise identification of root causes.

Experimental results on a large-scale dataset integrated from three open-source projects demonstrate BIC-Hunter's superior performance. It achieves 6.16% higher Recall@1, 7.13% higher Recall@2, and 5.53% higher Recall@3 compared to state-of-the-art methods. The Mean First Rank (MFR) index shows an improvement of 8.43% to 32.82%, indicating that BIC-Hunter identifies bug-inducing commits earlier in ranked lists. This work significantly advances automated software debugging by reducing false positives and enhancing semantic understanding.

Key Points
  • BIC-Hunter uses confident learning to denoise annotations and improve training data reliability.
  • Graph convolutional networks capture semantic relationships in code commits, boosting root-cause analysis.
  • Outperforms SOTA by 6.16% in Recall@1 and up to 32.82% in MFR index on three integrated open-source datasets.

Why It Matters

More accurate bug-inducing commit detection means faster debugging cycles and higher software reliability for development teams.