Developer Tools

Large Language Models for Analyzing Enterprise Architecture Debt in Unstructured Documentation

A new study shows fine-tuned LLMs can automatically find hidden IT design flaws in unstructured corporate documents.

Deep Dive

A team of researchers has developed a novel method for automating the detection of Enterprise Architecture Debt (EA Debt) using large language models. EA Debt refers to the accumulating cost of suboptimal design decisions and misaligned components in an organization's IT landscape. The study, led by Christin Pagels, Simon Hacks, and Rob Henk Bemthuis, addresses a key gap: while early warning signs called 'Enterprise Architecture Smells' exist, they are typically identified manually or only from structured data, leaving vast troves of unstructured documentation—like strategy papers and process descriptions—unanalyzed.

Following a design science research approach, the team built and evaluated an LLM-based prototype. This artifact ingests unstructured text, applies specialized detection models, and outputs identified architectural flaws. The evaluation involved a case study using synthetic yet realistic business documents, where the system's performance was benchmarked. They compared a custom model built on a GPT foundation against a fine-tuned, on-premise LLM. Results revealed a practical trade-off: the benchmark GPT model demonstrated higher precision and faster processing speed, while the fine-tuned on-premise model offered significant advantages in data protection and security for sensitive corporate information.

The findings, accepted for publication at the 41st ACM/SIGAPP Symposium on Applied Computing (SAC '26), highlight a tangible path forward for IT governance. By automating the detection of EA Smells from everyday documentation, organizations can shift from reactive to proactive management of their technical debt. This LLM-powered approach could be integrated into continuous governance practices, helping architects identify risks earlier and prioritize remediation efforts based on data-driven insights from their own document corpus, ultimately leading to more resilient and cost-effective IT ecosystems.

Key Points
  • The prototype automates detection of 'EA Smells'—early indicators of IT design debt—from unstructured text like process docs and strategy papers.
  • Benchmarking showed a custom GPT model achieved higher precision and speed, while a fine-tuned on-premise model prioritized data security for sensitive info.
  • The study provides a blueprint for integrating LLM-based smell detection into proactive Enterprise Architecture governance practices.

Why It Matters

This automates a manual, error-prone audit task, letting IT architects proactively manage technical debt and align systems with business strategy using existing documents.