Developers spend a median of 4 hours determining if a CI failure relates to their patch?

Developers spend a median of 4 hours determining if a CI failure relates to their patch.

20% of unrelated failures are due to test flakiness or infrastructure, not code changes?

20% of unrelated failures are due to test flakiness or infrastructure, not code changes.

PU learning models using 33 features achieved 0.70–0.88 precision and 0.63–0.97 AUC across 7 Apache projects?

PU learning models using 33 features achieved 0.70–0.88 precision and 0.63–0.97 AUC across 7 Apache projects.

Developer Tools

4 hours wasted per CI failure? New model cuts diagnosis time

arXiv cs.SE May 08, 2026

⚡77,354 build failures from 7 Apache projects reveal 20% unrelated to your code.

Deep Dive

Researchers analyzed 77,354 CI build failures from seven open source Apache projects. They found developers spend a median of 4 hours determining if a failure relates to their patch. Using semi-supervised PU learning with 33 features (latency, error repeats, comments), their models achieved 0.70–0.88 precision and 0.63–0.97 AUC, helping engineers skip false alarms and focus on actionable failures.

Key Points

Developers spend a median of 4 hours determining if a CI failure relates to their patch.
20% of unrelated failures are due to test flakiness or infrastructure, not code changes.
PU learning models using 33 features achieved 0.70–0.88 precision and 0.63–0.97 AUC across 7 Apache projects.

Why It Matters

Saves engineering teams hours per failure by automating detection of irrelevant CI red flags.

Read Original Article

4 hours wasted per CI failure? New model cuts diagnosis time

Why It Matters

Related Articles

🚀 Stay Ahead in AI