Developer Tools

Code2UML uses 5 Claude agents to auto-generate UML from code

91.5% syntactic validity across 12 repos and 4 languages with zero LLM calls for compaction

Deep Dive

A new paper from researchers at the University of Bucharest introduces Code2UML, an agentic system that automatically generates UML diagrams from source code using large language models. The architecture employs five specialized agents built on the Claude Agent SDK: PlannerAgent, AnalyzerAgent, DiagramAgent, CorrectorAgent, and DependencyAnalyzerAgent, each handling a distinct cognitive subtask. A key innovation is a deterministic, importance-weighted intermediate representation compaction layer that transforms full project IRs into diagram-specific views guaranteed to fit within token constraints — requiring no LLM calls and completing in milliseconds.

Code2UML was evaluated across 12 open-source repositories in four programming languages (Java, JavaScript, PHP, Python) and seven UML diagram types, producing 84 observations assessed on five automated metrics. Results showed high syntactic validity (mean 91.5%, with component and deployment diagrams reaching 100%), strong relationship precision (mean 0.858), and consistent structural quality (mean 81.7/100 with cross-language variance of only 3.1 points). Entity recall averaged 0.313, reflecting deliberate architectural prioritization over exhaustive coverage. A sensitivity analysis confirmed that quality scores remain stable across scales from 31 to 4,578 IR entities, demonstrating true scalability.

Key Points
  • Five specialized agents (Planner, Analyzer, Diagram, Corrector, DependencyAnalyzer) built on the Claude Agent SDK automate the full UML generation pipeline
  • Deterministic IR compaction runs in milliseconds with zero LLM calls, ensuring token-limit compliance even for large codebases (tested up to 4,578 entities)
  • Achieved 91.5% syntactic validity, 0.858 relationship precision, and 81.7/100 structural quality across 84 configurations in 4 languages

Why It Matters

Automates tedious software documentation, making UML generation practical for real-world codebases without manual trimming or expensive LLM calls.