Class Model Generation from Requirements using Large Language Models
New study shows LLMs can generate accurate UML class diagrams, matching human evaluators 80% of the time.
A team of researchers including Jackson Nguyen, Rui En Koe, and Fanyu Wang has published groundbreaking research on automating software design using large language models. Their paper, "Class Model Generation from Requirements using Large Language Models," demonstrates how state-of-the-art LLMs including GPT-5, Claude Sonnet 4.0, Gemini 2.5 Flash Thinking, and Llama-3.1-8B-Instruct can generate UML class diagrams directly from natural language requirements. The researchers employed chain-of-thought prompting to extract domain entities, attributes, and associations, then generated corresponding PlantUML representations—a significant advancement in automating what has traditionally been a manual, resource-intensive phase of software engineering.
To validate their approach, the team created a comprehensive dual-validation framework that integrates LLM-as-a-Judge methodology with human-in-the-loop assessment. They tested the system across eight heterogeneous datasets, evaluating generated diagrams across five quality dimensions: completeness, correctness, conformance to standards, comprehensibility, and terminological alignment. Two independent LLM judges (Grok and Mistral) performed structured pairwise comparisons, with their judgments validated against expert evaluations. The results show LLMs can generate structurally coherent and semantically meaningful UML diagrams with substantial alignment to human evaluators, highlighting their potential as both modeling assistants and reliable evaluators in automated requirements engineering workflows.
The research, accepted by The Eighth Workshop on Modeling and Simulation of Software-Intensive Systems at ICSE 2026, represents a practical step toward AI-assisted software development. By demonstrating consistency between LLM-based and human-based assessments, the study provides concrete insights into the capabilities and limitations of LLM-driven UML automation. This work could significantly reduce the time and expertise required for initial software design phases while maintaining quality standards through automated validation mechanisms.
- Tested GPT-5, Claude Sonnet 4.0, Gemini 2.5 Flash, and Llama-3.1 across eight heterogeneous datasets
- Used dual-validation framework with LLM judges (Grok and Mistral) and human experts for assessment
- Achieved substantial alignment between LLM-generated UML diagrams and human evaluator standards
Why It Matters
Automates time-consuming software design tasks, potentially cutting initial development phases from days to minutes while maintaining quality.