Agent Frameworks

Multi-Agent LLM Simulator Recreates Nuclear Team Failures with 53% Accuracy

AI model recreates Chernobyl and Three Mile Island disasters with near-perfect timing.

Deep Dive

TEAM-SimHRA, developed by researchers including Xingyu Xiao, reimagines human reliability analysis (HRA) for high-stakes team environments. Traditional HRA assigns fixed error probabilities to individual tasks, but fails to capture how team dynamics like delayed diagnosis, suppressed dissent, and authority-driven error propagation cause catastrophic failures. This multi-agent LLM framework treats reliability as an emergent property of team interactions, simulating real-time communication and role-conditioned authority during accident progressions.

Validated against the two most documented nuclear disasters—Three Mile Island (1979) and Chernobyl (1986)—TEAM-SimHRA achieved face-validity pass rates of 43.5% and 52.6%, respectively. It reproduced key historical metrics: near-exact decision delay (134.8 minutes simulated vs. 138 actual), perfect communication suppression stability, and full authority pressure cascades at accurate propagation depths. These results demonstrate that multi-agent LLM simulations can extract quantitative team-level reliability indicators inaccessible to traditional methods, paving the way for dynamic probabilistic risk assessment in safety-critical systems like nuclear plants, aviation, and military command centers.

Key Points
  • Uses multi-agent LLMs to model team interactions, not individual error rates
  • Validated against Three Mile Island (43.5% pass rate) and Chernobyl (52.6%) disasters
  • Reproduced decision delay within 3 minutes of historical data (134.8 vs 138 min)
  • Captures authority pressure cascades and communication suppression at accurate depths

Why It Matters

This framework could transform risk assessment for nuclear, aviation, and other safety-critical team operations.