When the Chain Breaks: Interactive Diagnosis of LLM Chain-of-Thought Reasoning Errors
New system combines fact-checking and logic validation to pinpoint where LLM reasoning goes wrong.
A team of researchers led by Shiwei Chen has developed ReasonDiag, an interactive visualization system designed to diagnose errors in Large Language Model (LLM) Chain-of-Thought (CoT) reasoning. The system addresses a critical problem: while models like GPT-4 and Claude generate step-by-step reasoning traces to build trust, these traces are often lengthy and can contain subtle logical or factual errors that are difficult for users to spot. ReasonDiag tackles this by first running CoT outputs through a novel error detection pipeline. This pipeline combines external fact-checking against verified sources with symbolic formal logical validation to identify erroneous reasoning steps with precision.
Building on this detection, ReasonDiag presents the findings through two core visualizations. An integrated arc diagram shows the distribution of reasoning steps and, crucially, visualizes how errors propagate through the chain. A hierarchical node-link diagram visualizes the high-level reasoning flow and the dependencies between different premises. The researchers evaluated the system through technical benchmarks, case studies, and user interviews with 16 participants. The results, accepted for publication at EuroVis 2026, indicate that ReasonDiag significantly helps users understand complex reasoning traces, efficiently identify which steps are wrong, and determine the underlying root causes of those errors, moving beyond simple output checking to true reasoning audit.
- Combines external fact-checking and symbolic logic validation to create an error detection pipeline for LLM reasoning steps.
- Provides dual visualization via arc diagrams for error propagation and node-link diagrams for premise dependency mapping.
- Validated in user studies with 16 participants, showing improved ability to understand traces and find error root causes.
Why It Matters
Enables professionals to audit and trust complex AI reasoning in critical domains like finance, law, and medicine.