EndoGov: A knowledge-governed multi-agent expert system for endometrial cancer risk stratification
AI that follows doctor’s rules hits 97.3% AUC in cancer staging
A team of researchers from multiple institutions has introduced EndoGov, a knowledge-governed multi-agent expert system designed to enforce clinical guideline compliance in endometrial cancer (EC) risk stratification. Unlike standard multimodal AI models that optimize for aggregate accuracy but ignore mandatory clinical overrides—such as assigning POLE-mutated tumors to the low-risk group regardless of high-grade morphology—EndoGov explicitly factorizes decision-making into two tiers. Tier 1 deploys specialist agents (pathology, molecular, and clinical) that independently generate schema-constrained reports from frozen foundation-model features or structured records. Tier 2 then queries an evidence-level-weighted Guideline Knowledge Graph using deterministic hard-path rules for high-priority overrides and constrained soft-path reasoning for ambiguous cases.
In rigorous testing on the TCGA-UCEC cohort (n=541), EndoGov achieved 0.943 accuracy, 0.973 macro AUC, and a conditional logic-violation rate (C-LVR) of just 0.93% among trigger-exposed cases—meaning it almost never broke the clinical rules. On the CPTAC-UCEC cohort (n=95) where reference labels are guideline-derived, EndoGov reached 0.842 accuracy compared with less than 0.31 for locked-transfer neural baselines, demonstrating robust governance-pathway transfer under distribution shift. End-to-end safety decomposition showed that residual failures stemmed primarily from upstream molecular detection, not the governance layer. Backend-swap experiments further confirmed that hard-path compliance is invariant to the LLM backend, ensuring reliability across models. This work offers a practical blueprint for auditable, guideline-compliant AI in high-stakes medical decisions.
- Two-tier architecture: specialist agents extract structured evidence, governance agent enforces clinical rule set
- Achieved 97.3% macro AUC and 94.3% accuracy on TCGA-UCEC (n=541) with only 0.93% logic violations
- Outperformed locked-transfer neural baselines (84.2% vs <31% accuracy) on CPTAC-UCEC under distribution shift
Why It Matters
Enables auditable, guideline-compliant AI for cancer risk stratification, reducing dangerous rule violations in clinical decisions.