AI Safety

Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification

Agentic AI's 'container fallacy' leaves governance blind spots—DEMM closes them.

Deep Dive

A new paper by Oleg Solozobov introduces the Decision Evidence Maturity Model (DEMM), a property-level method for auditing agentic AI systems. The work addresses what the author calls the 'container fallacy': the mistaken assumption that collecting decision evidence (logs, telemetry) automatically means the evidence is sufficient to answer specific governance questions. DEMM classifies evidence sufficiency into four executable categories plus a 'conflicting' category, then aggregates per-property verdicts into a five-level capability rubric similar to established maturity models. This allows external auditors to determine exactly how much trust to place in an AI agent's decision records.

The paper also ships the Decision Trace Reconstructor v0.1.0, an open-source tool (Apache-2.0) with ten adapter-fallback classes covering major vendor SDKs, protocol traces, public-postmortem prose, and generic JSONL records. In a feasibility exercise across 140 synthetic scenarios and three public incidents, the tool achieved a completeness range of 53.6% to 100%. Solozobov emphasizes this range reflects implementation behavior, not external validation. The work includes a companion Decision Event Schema (MIT licensed). For professionals building or auditing autonomous agents, DEMM provides a much-needed standardized framework to evaluate whether decision evidence truly supports post-hoc governance demands.

Key Points
  • Identifies the 'container fallacy' – the false equivalence between having evidence logs and having audit-sufficient evidence
  • DEMM defines 5 maturity levels using 4 executable categories plus 1 conflicting category for property-level evidence sufficiency
  • Open-source Decision Trace Reconstructor ships 10 adapter-fallback classes covering vendor SDKs, protocol traces, and JSONL, achieving up to 100% completeness on real incidents

Why It Matters

Enables reliable, standardized audit of agentic AI decisions – crucial for governance, compliance, and debugging autonomous systems.