Multi-agent LLM framework extracts knowledge graphs for Ethernet switch testing with 99% accuracy
A new approach turns semi-structured manuals into testable knowledge with near-perfect extraction correctness...
A team of researchers led by Rongqi Pan has introduced a multi-agent LLM framework designed to extract structured knowledge graphs from Ethernet switch configuration manuals (ESCMs), a notoriously difficult document type due to semi-structured formatting, implicit step attributes, and complex cross-section dependencies. Their work, published on arXiv (paper 2605.19180), focuses on system testing automation but is intended as a general framework adaptable to other industrial domains. The approach uses a fine-grained KG schema and an iterative Extract-Evaluate-Improve (EEI) loop, where LLMs first extract candidate facts, then have them evaluated against ground truth, and finally refine extraction prompts for hard cases.
Testing on 50 real-world ESCMs, the framework achieved average extraction correctness scores between 0.97 and 0.99 across three extraction tasks using the original prompts. For challenging manuals, the EEI loop further boosted correctness through manual-specific prompt refinement. Importantly, LLM judgments showed substantial agreement with human evaluators — Cohen's kappa values exceeded 0.72 for all tasks. Industrial testers provided feedback that the resulting knowledge graphs enabled generation of useful and correct test case specifications (TCSs), paving the way for more automated system testing in networking and beyond.
- Multi-agent LLM framework uses an iterative Extract-Evaluate-Improve loop to handle complex, semi-structured documents
- Achieves 0.97–0.99 correctness on 50 real-world Ethernet switch configuration manuals
- LLM judgments agree substantially with human evaluators (Cohen's kappa ≥0.72), and generated KGs support downstream test case generation
Why It Matters
Turns messy technical manuals into machine-usable knowledge graphs, enabling automated system testing and reducing manual test design effort.