COOL-MC: Verifying and Explaining RL Policies for Platelet Inventory Management
New research uses formal verification to make life-saving AI inventory decisions transparent and safe.
New research demonstrates a significant step toward trustworthy AI in life-or-death logistics. Researcher Dennis Gross presents COOL-MC, a novel tool designed to verify and explain the behavior of reinforcement learning (RL) policies applied to safety-critical problems. The case study focuses on platelet inventory management, a complex Markov decision process where blood banks must balance the risk of life-threatening shortages against costly wastage of a product that expires in just five days. While RL can learn effective ordering strategies, its neural network policies are typically black boxes, creating a major barrier to adoption in healthcare. COOL-MC addresses this by formally constructing a policy-induced Markov chain and using probabilistic model checking to verify properties.
Technical analysis reveals the verified RL policy performs robustly, maintaining a stockout probability of just 2.9% and an inventory-full (wastage) probability of 1.1% over a 200-step horizon. The explainable AI component shows the policy's decisions are primarily driven by the age distribution of platelets in stock, rather than secondary features like the day of the week. Counterfactual and reachability analyses provide deeper insight, revealing which order quantities the policy uses and demonstrating that certain medium-large orders are only placed in well-buffered states. This work marks the first formal verification of an RL policy for this domain, proving COOL-MC's value for creating transparent, auditable AI systems that can be trusted with high-stakes decisions in healthcare and other critical supply chains.
- COOL-MC verifies an RL policy for platelet management, achieving a 2.9% stockout and 1.1% wastage probability.
- The tool uses probabilistic model checking on a policy-induced Markov chain to provide formal guarantees and feature-level explanations.
- Analysis shows the AI policy focuses on inventory age distribution and employs a diverse, state-dependent ordering strategy.
Why It Matters
Enables deployment of trustworthy, explainable AI for critical healthcare logistics where safety and auditability are non-negotiable.