Clinical Narrative format boosts Mistral-7B by 19 F1 points over Raw JSON for medication reconciliation?

Clinical Narrative format boosts Mistral-7B by 19 F1 points over Raw JSON for medication reconciliation.

At 70B parameters, Raw JSON achieves 0.9956 F1, reversing the advantage seen with smaller models?

At 70B parameters, Raw JSON achieves 0.9956 F1, reversing the advantage seen with smaller models.

Smaller models plateau at 7-10 active medications, leaving polypharmacy patients underserved?

Smaller models plateau at 7-10 active medications, leaving polypharmacy patients underserved.

Research & Papers

FHIR data format choice boosts LLM medication reconciliation by 19 F1

arXiv cs.CL April 24, 2026

⚡How you format FHIR data can make or break LLM accuracy in clinical handoffs.

Deep Dive

A new preprint from Sanjoy Pator, published on arXiv, presents the first systematic comparison of four FHIR serialisation strategies for LLM-based medication reconciliation—a high-stakes task in clinical handoffs. The study tested Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline across five open-weight models (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) on 200 synthetic patients, totaling 4,000 inference runs. The results show that serialisation strategy has a large, statistically significant effect on performance for models up to 8B parameters: Clinical Narrative outperforms Raw JSON by up to 19 F1 points for Mistral-7B (r = 0.617, p < 10^{-10}). However, this advantage reverses at 70B, where Raw JSON achieves the best mean F1 of 0.9956. Across all 20 model-strategy combinations, mean precision exceeds mean recall, indicating omission (missing active medications) is the dominant failure mode, not fabrication. Smaller models plateau at roughly 7-10 concurrent active medications, leaving polypharmacy patients systematically underserved. BioMistral-7B, a domain-pretrained model without instruction tuning, produced zero usable output, showing domain pretraining alone is insufficient for structured extraction. The study offers evidence-based recommendations: Clinical Narrative for models up to 8B, Raw JSON for 70B and above. The complete pipeline is reproducible using open-source tools on an AWS instance with an NVIDIA L40S (48 GB VRAM).

Key Points

Clinical Narrative format boosts Mistral-7B by 19 F1 points over Raw JSON for medication reconciliation.
At 70B parameters, Raw JSON achieves 0.9956 F1, reversing the advantage seen with smaller models.
Smaller models plateau at 7-10 active medications, leaving polypharmacy patients underserved.

Why It Matters

Simple format choices can drastically improve LLM accuracy in clinical settings, reducing medication errors.

Read Original Article

FHIR data format choice boosts LLM medication reconciliation by 19 F1

Why It Matters

Related Articles

🚀 Stay Ahead in AI