Research & Papers

LLM-Augmented Knowledge Base Construction For Root Cause Analysis

A new study compares Fine-Tuning, RAG, and Hybrid LLM approaches to automate root cause analysis for critical network failures.

Deep Dive

A team of researchers from Concordia University and Ericsson has published a paper, accepted by IEEE Access, detailing a novel method for automating a critical but time-consuming IT task: building a knowledge base for Root Cause Analysis (RCA). When network outages occur, engineers must sift through thousands of past support tickets to find similar incidents and solutions. This research automates that process by using Large Language Models (LLMs) to digest historical ticket data and construct a structured, queryable knowledge base.

The study rigorously compared three LLM methodologies: Fine-Tuning a base model on the ticket data, using a RAG (Retrieval-Augmented Generation) system to fetch relevant ticket information, and a Hybrid approach combining both. Tested on a real-world industrial dataset, the methods were evaluated using both lexical and semantic similarity metrics to ensure the generated knowledge was accurate and contextually relevant. The results demonstrate that an LLM-augmented knowledge base can serve as an excellent starting point for engineers, drastically cutting down the initial diagnostic phase of an outage.

This work directly tackles the immense pressure on communications providers to maintain 'five 9s' (99.999%) reliability. By transforming unstructured, natural-language ticket archives into a structured knowledge resource, the system enables faster identification of failure patterns and proven solutions. This acceleration in RCA is crucial for minimizing downtime, restoring service rapidly, and ultimately improving overall network resilience against future disruptions.

Key Points
  • The study tested three LLM approaches—Fine-Tuning, RAG, and Hybrid—to build an RCA knowledge base from unstructured support tickets.
  • Evaluation on real industrial data used comprehensive lexical and semantic metrics to ensure the quality of the generated knowledge.
  • The automated system provides engineers a rapid starting point for diagnostics, directly supporting the goal of 99.999% network reliability.

Why It Matters

This automates a manual, critical process for network engineers, potentially slashing outage resolution times and bolstering infrastructure reliability.