From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems
New method transforms chaotic AI agent logs into structured graphs to pinpoint root causes.
A team of researchers has introduced CHIEF, a novel framework designed to solve a critical problem in deploying LLM-based Multi-Agent Systems (MAS): opaque and fragile failure modes. Current methods treat execution logs as flat sequences, which fails to capture the complex, intertwined causal relationships between agents, leading to poor observability and ambiguous blame. CHIEF directly addresses this by transforming these chaotic interaction trajectories into a structured, hierarchical causal graph, providing a clear map of agent dependencies and decision pathways.
The technical core of CHIEF involves a three-stage process. First, it constructs the hierarchical graph from logs. Then, it uses synthesized 'virtual oracles' to guide efficient backtracking, pruning the vast search space of potential failure points. Finally, it implements a progressive causal screening strategy for counterfactual analysis, rigorously distinguishing the original root cause from mere symptoms that propagated through the system. In experiments on the Who&When benchmark, CHIEF demonstrated superior performance, outperforming eight strong baselines. This work marks a significant step from black-box debugging towards explainable, reliable multi-agent AI, which is essential for deploying these systems in high-stakes applications like software engineering, finance, and autonomous operations.
- CHIEF transforms flat execution logs into hierarchical causal graphs to model complex agent interactions.
- Uses synthesized virtual oracles and counterfactual analysis to pinpoint root causes, not just symptoms.
- Outperformed eight state-of-the-art baselines on the Who&When benchmark for failure attribution accuracy.
Why It Matters
Enables reliable debugging of complex AI agent teams, critical for deploying them in production software and autonomous systems.