Achieved 90.14% accuracy in root cause diagnosis on 71 real-world test failures?

Achieved 90.14% accuracy in root cause diagnosis on 71 real-world test failures.

Deployed at scale, analyzing 52,635 distinct failing tests with a 'Not helpful' rate of only 5.8%?

Deployed at scale, analyzing 52,635 distinct failing tests with a 'Not helpful' rate of only 5.8%.

Integrated directly into Google's Critique system, ranking #14 in helpfulness among 370 internal tools?

Integrated directly into Google's Critique system, ranking #14 in helpfulness among 370 internal tools.

Developer Tools

Google's Auto-Diagnose AI tool fixes test failures with 90% accuracy

arXiv cs.SE April 15, 2026

⚡Google's new LLM tool analyzed over 52,000 test failures, achieving 90.14% root cause accuracy.

Deep Dive

Google researchers have developed and deployed a novel AI tool called Auto-Diagnose that uses Large Language Models (LLMs) to automatically diagnose the root cause of integration test failures. Integration tests, which check how different software components work together, generate massive, unstructured logs that are notoriously difficult for developers to parse. Auto-Diagnose tackles this by analyzing these failure logs, identifying the most relevant lines, and producing concise summaries. It is directly integrated into Critique, Google's internal code review system, providing developers with contextual, in-time assistance directly within their workflow.

A manual evaluation on 71 real-world failures demonstrated an impressive 90.14% accuracy in diagnosing the correct root cause. Following its Google-wide deployment, the tool was used to analyze 52,635 distinct failing tests. User feedback was overwhelmingly positive, with the tool being deemed 'Not helpful' in only 5.8% of cases. Among 370 tools that post findings in Critique, Auto-Diagnose ranked #14 in helpfulness. User interviews confirmed the tool's perceived usefulness and the positive reception of integrating AI-powered diagnostic assistance into existing developer workflows.

The research concludes that LLMs are highly effective for this task due to their ability to process and summarize complex textual data. The study also highlights that the tool's high accuracy is a critical factor driving developer adoption and positive perception. This represents a significant step in using AI to reduce cognitive load and save time on tedious debugging tasks, allowing engineers to focus on more creative problem-solving.

Key Points

Achieved 90.14% accuracy in root cause diagnosis on 71 real-world test failures.
Deployed at scale, analyzing 52,635 distinct failing tests with a 'Not helpful' rate of only 5.8%.
Integrated directly into Google's Critique system, ranking #14 in helpfulness among 370 internal tools.

Why It Matters

Saves developers hours of tedious log analysis, accelerating software development and improving code reliability at scale.

Read Original Article

Google's Auto-Diagnose AI tool fixes test failures with 90% accuracy

Why It Matters

Related Articles

🚀 Stay Ahead in AI