Is your AI Model Accurate Enough? The Difficult Choices Behind Rigorous AI Development and the EU AI Act
A landmark paper argues the EU AI Act's 'appropriate accuracy' requirement forces hidden value judgments.
A team of six researchers from Maastricht University and the University of Southampton has published a pivotal paper challenging the notion that AI model 'accuracy' is a purely technical, objective property. Their work, set to appear at the 2026 ACM FAccT conference, uses the European Union's 2024 AI Act as a case study. The Act mandates an 'appropriate level of accuracy' for high-risk AI systems, but the paper argues this requirement forces a series of hidden, context-dependent value judgments. The authors term these 'techno-normative choices,' which are crucial for rigorous deployment as they determine which errors are prioritized and how risks are distributed across society.
The paper identifies and analyzes four core choices that shape any robust performance evaluation. First is the selection of metrics (e.g., precision, recall, F1-score), where each metric prioritizes different types of errors. Second is balancing multiple, often competing, metrics. Third is ensuring metrics are measured against truly representative data, which itself requires normative decisions about what constitutes a fair test. Finally, the fourth choice is determining the acceptance threshold—the specific performance level deemed 'appropriate.' For each choice, the researchers show how technical implementation embeds assumptions about acceptable risks and trade-offs, directly impacting the practical enforcement of the AI Act.
By making these implicit trade-offs explicit, the paper provides critical guidance for regulators setting standards, auditors assessing compliance, and developers building high-risk systems. It bridges the gap between abstract legal safety requirements and concrete technical practice, arguing that responsible AI governance requires acknowledging that accuracy is not just a number but a reflection of societal values and priorities.
- The EU AI Act's 'appropriate accuracy' mandate for high-risk AI forces four key techno-normative choices in evaluation.
- These choices—metric selection, balancing, data representativeness, and threshold setting—embed value judgments about acceptable risks and errors.
- The research provides a framework to help regulators, auditors, and developers translate legal safety requirements into technical practice.
Why It Matters
Forces AI builders and regulators to confront the ethical trade-offs hidden behind every performance metric, moving beyond simplistic benchmarks.