Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture
A dual-stream memory system prevents dangerous errors in longitudinal health coach agents.
Researchers Samuel L. Pugh and colleagues propose a Dual-Stream Memory Architecture for LLM-based health coaching agents, addressing the critical challenge of reconciling patient self-report (current but biased) with Electronic Health Records (EHRs, validated but often outdated). The system strictly separates patient narrative from structured clinical data (FHIR format) and uses a dedicated Reconciliation Engine to classify every extracted memory against the patient's EHR profile, flagging discrepancies by type, severity, and involved FHIR resources. In tests on 26 patients across 675 longitudinal wellness coaching sessions (using a hybrid dataset of real transcripts and synthetic FHIR-grounded scenarios), the engine detected 84.4% of designed clinical discrepancies with 86.7% safety-critical recall.
The study directly quantified a 13.6% error cascade, tracing the degradation to clinical details lost during memory extraction from unstructured conversation—not to downstream classification errors. This finding establishes that validating patient-reported memories against clinical records is both feasible and necessary for safe deployment of persistent health agents. The work underscores a key risk in existing general-purpose memory systems: they optimize for coherence by overwriting older facts with the latest user statement, a pattern that can lead to safety failures in clinical contexts. This architecture paves the way for safer, more reliable AI-assisted healthcare coaching.
- Dual-Stream Memory separates patient narrative from FHIR-structured EHR data, with a Reconciliation Engine classifying discrepancies.
- 84.4% detection rate for clinical discrepancies and 86.7% recall for safety-critical issues across 675 sessions.
- 13.6% error cascade identified, originating from memory extraction of unstructured conversation rather than downstream classification.
Why It Matters
Enables safe, persistent AI health coaches by preventing dangerous memory errors from conflicting patient and clinical data.