Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework
A new framework reveals critical gaps in how we constrain AI agents...
AI governance prompts—the natural language instructions that define an AI agent's mandate, scope, and quality criteria—are increasingly critical as AI systems take on more autonomous roles. However, until now, there has been no systematic way to evaluate whether these prompts are structurally complete. In a new empirical study published on arXiv, researcher Christo Zietsman proposes a five-principle evaluation framework grounded in computability theory, proof theory, and Bayesian epistemology. Applying this framework to a corpus of 34 publicly available CursorRules governance files from GitHub, the study reveals that 37% of evaluated file-model pairs score below the structural completeness threshold. The most frequently missing components are data classification and assessment rubric criteria, indicating that practitioners often overlook key specifications needed to ensure safe and reliable AI behavior.
The findings have significant implications for requirements engineering in AI-assisted development. The study identifies a previously undocumented artefact classification gap in the CursorRules convention, suggesting that current practices may not adequately capture the full scope of governance requirements. Zietsman argues that these structural patterns are consistent enough for automated static analysis to detect and remediate, potentially preventing governance failures before they occur. The research also proposes directions for tool support, such as linters or validators that could check governance prompts for completeness. As organizations increasingly rely on AI agents for critical tasks, ensuring governance prompts are structurally sound could become a standard part of the software engineering workflow, much like code reviews and automated testing are today.
- 37% of evaluated AI governance prompt file-model pairs are structurally incomplete
- Study analyzed 34 public CursorRules governance files from GitHub
- Data classification and assessment rubrics are the most commonly missing components
Why It Matters
Automated static analysis could fix AI governance prompt gaps, preventing agent misbehavior in production.