Research & Papers

Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation

How you format FHIR data can make or break LLM accuracy in clinical handoffs.

Deep Dive

A new preprint from Sanjoy Pator, published on arXiv, presents the first systematic comparison of four FHIR serialisation strategies for LLM-based medication reconciliation—a high-stakes task in clinical handoffs. The study tested Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline across five open-weight models (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) on 200 synthetic patients, totaling 4,000 inference runs. The results show that serialisation strategy has a large, statistically significant effect on performance for models up to 8B parameters: Clinical Narrative outperforms Raw JSON by up to 19 F1 points for Mistral-7B (r = 0.617, p < 10^{-10}). However, this advantage reverses at 70B, where Raw JSON achieves the best mean F1 of 0.9956. Across all 20 model-strategy combinations, mean precision exceeds mean recall, indicating omission (missing active medications) is the dominant failure mode, not fabrication. Smaller models plateau at roughly 7-10 concurrent active medications, leaving polypharmacy patients systematically underserved. BioMistral-7B, a domain-pretrained model without instruction tuning, produced zero usable output, showing domain pretraining alone is insufficient for structured extraction. The study offers evidence-based recommendations: Clinical Narrative for models up to 8B, Raw JSON for 70B and above. The complete pipeline is reproducible using open-source tools on an AWS instance with an NVIDIA L40S (48 GB VRAM).

Key Points
  • Clinical Narrative format boosts Mistral-7B by 19 F1 points over Raw JSON for medication reconciliation.
  • At 70B parameters, Raw JSON achieves 0.9956 F1, reversing the advantage seen with smaller models.
  • Smaller models plateau at 7-10 active medications, leaving polypharmacy patients underserved.

Why It Matters

Simple format choices can drastically improve LLM accuracy in clinical settings, reducing medication errors.