CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs
A new framework ensures AI responses only release when safety and accuracy scores are low.
Patient-facing large language models (LLMs) hold promise for democratizing medical advice, but they risk providing contextually inappropriate or hallucinated answers. A new paper on arXiv presents CareGuardAI, a multi-agent guardrail system that tackles two core failure modes: clinical safety risk and factual unreliability. Developed by researchers at City University of Hong Kong and Virginia Tech, the framework includes a Clinical Safety Risk Assessment (SRA) modeled on ISO 14971 (the international standard for medical device risk management) and a Hallucination Risk Assessment (HRA). At inference time, a controller agent orchestrates safety-constrained generation, evaluates both risks, and iteratively refines responses until both SRA and HRA score ≤2, ensuring clinically acceptable outputs with bounded latency.
The team evaluated CareGuardAI on three benchmarks: PatientSafeBench (for real-world patient queries), MedSafetyBench (for medical safety), and MedHallu (for hallucination detection). Across all benchmarks, the framework consistently outperformed strong baselines, including GPT-4o-mini. The results highlight the importance of context-aware, risk-based guardrails over simple confidence thresholds. By enforcing multi-agent coordination and iterative refinement, CareGuardAI addresses the tendency of LLMs to produce agreeable but medically unsafe responses. The system is designed for open-ended, underspecified patient interactions—a major step toward trustworthy clinical AI deployment.
- Uses dual risk scores (SRA and HRA) inspired by ISO 14971 to reject unsafe or hallucinated responses.
- Multi-agent pipeline with controller, safety-constrained generation, and iterative refinement until risks ≤2.
- Outperformed GPT-4o-mini on PatientSafeBench, MedSafetyBench, and MedHallu benchmarks.
Why It Matters
CareGuardAI offers a practical blueprint for deploying LLMs in healthcare without compromising patient safety or factual accuracy.