Developer Tools

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

Researchers used ChatGPT to create 300 system requirement specs across 10 industries.

Deep Dive

A team of researchers from RWTH Aachen University published a study investigating whether OpenAI's ChatGPT could generate realistic Synthetic System Requirement Specifications (SyRS). These documents are crucial, natural-language artifacts in software engineering, but real ones are scarce for research due to confidentiality. The team aimed to address this scarcity by using ChatGPT's black-box generation capabilities, despite known challenges like hallucinations and overconfidence.

Using a systematic methodology involving prompt patterns, LLM-based quality assessments, and iterative refinements, the researchers generated 300 SSyRSs across 10 different industries. They then evaluated the output through cross-model checks and an expert survey with 87 participants. The results showed that 62% of experts considered the AI-generated specifications to be realistic, demonstrating a significant potential use case for LLMs in creating training or benchmarking data.

However, a deeper analysis revealed persistent flaws, including contradictory statements and other deficiencies within the generated documents. The study concludes that while ChatGPT can generate somewhat realistic SSyRSs, LLM-based quality assessments are insufficient on their own and cannot replace thorough, human expert evaluation. The paper detailing this methodology and the key insights will appear in the proceedings of the ENASE 2026 conference.

Key Points
  • Researchers generated 300 Synthetic System Requirement Specs using ChatGPT across 10 industries.
  • In an expert survey (n=87), 62% of participants rated the AI-generated specs as realistic.
  • In-depth review found flaws like contradictory statements, showing LLMs can't fully replace expert evaluation.

Why It Matters

Shows AI can help create software training data but highlights critical need for human oversight to catch errors.