Research & Papers

FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records

Researchers shift clinical AI from text generation to executable code synthesis, cutting LLM usage and hallucinations.

Deep Dive

Researchers from the University of British Columbia and Vector Institute have introduced FHIRPath-QA, a groundbreaking open dataset and benchmark designed to solve a critical problem in healthcare AI: getting precise, trustworthy answers from electronic health records (EHRs). While patients increasingly have digital access to their records, existing interfaces and standard retrieval-augmented generation (RAG) approaches with large language models (LLMs) are computationally inefficient, prone to hallucination, and difficult to deploy. FHIRPath-QA proposes a new paradigm that shifts the reasoning task from free-text generation to the synthesis of executable FHIRPath queries, a standardized query language for healthcare data. This approach promises to reduce LLM usage and improve the safety and reliability of AI-powered health applications.

The dataset, built on the real-world MIMIC-IV on FHIR Demo, pairs over 14,000 natural language questions—phrased by both patients and clinicians—with validated FHIRPath queries and their answers. The research demonstrates that current state-of-the-art LLMs, including GPT-4 and Claude 3, struggle with the ambiguity in patient language and perform poorly at synthesizing these executable queries directly. However, the study shows they benefit strongly from supervised fine-tuning on this specific task. By providing this dataset and benchmark, the team aims to establish a practical foundation for safe, efficient, and interoperable consumer health tools, moving beyond unreliable text generation to verifiable, code-based answers. The full dataset and generation code are publicly available, setting a new starting point for research in executable clinical question answering.

Key Points
  • Introduces the first open dataset with over 14,000 patient/clinician questions paired with executable FHIRPath queries for real-world EHR data.
  • Proposes a text-to-FHIRPath synthesis paradigm that reduces reliance on LLMs for generation, cutting computational costs and hallucination risks.
  • Shows current LLMs like GPT-4 struggle with query synthesis but improve significantly with fine-tuning, providing a clear path for future model development.

Why It Matters

Enables safer, verifiable AI health assistants by moving from unreliable text generation to executable, standardized code queries.