Developer Tools

Reducing Labeling Effort in Architecture Technical Debt Detection through Active Learning and Explainable AI

A new method slashes the time-consuming manual labeling of architectural technical debt by nearly half.

Deep Dive

A team of researchers has published a novel approach to automating the detection of Architecture Technical Debt (ATD), a costly and abstract form of technical compromise that developers explicitly admit to in code comments and issue trackers. The core challenge is that manually labeling ATD is prohibitively expensive and difficult to scale. This study, led by Edi Sutoyo, Paris Avgeriou, and Andrea Capiluppi, tackles this by first using keyword-based filtering to identify over 103,000 candidate issues from ten open-source projects, starting from a refined dataset of 57 expert-validated examples. This creates a massive, pre-filtered pool of potential ATD for analysis.

The researchers then applied active learning—a technique where an AI model selects the most informative data points for a human to label—to drastically cut down on manual work. Their results show that the 'Breaking Ties' query strategy was most effective, boosting the model's F1-score to 0.72 while reducing the annotation workload by 49%. To make the model's decisions transparent and trustworthy, they integrated explainable AI (XAI) tools SHAP and LIME. An expert evaluation found that both provided reasonable explanations for the AI's classifications, with a preference for LIME due to its clarity. This combination of scalable filtering, efficient active learning, and clear explanations provides a practical pipeline for engineering teams to proactively manage architectural debt.

Key Points
  • Combined keyword filtering and active learning reduced manual labeling effort for Architecture Technical Debt (ATD) by 49%.
  • The 'Breaking Ties' active learning strategy achieved the best model performance with a 0.72 F1-score.
  • Experts preferred LIME over SHAP for explaining the AI's ATD classifications, citing better clarity and ease of use.

Why It Matters

Enables software teams to automatically identify and manage costly architectural compromises, saving significant engineering time and resources.