Reproducible Synthetic Clinical Letters for Seizure Frequency Information Extraction
A team used AI-generated NHS letters to train models, achieving 0.858 F1 scores without real patient data.
A research team from King's College London and NHS partners has developed a breakthrough framework for extracting critical seizure frequency data from clinical letters without compromising patient privacy. Their system uses a teacher language model to generate 15,000 fully synthetic yet medically accurate NHS-style clinic letters, complete with structured labels covering seizure rates, ranges, clusters, and seizure-free intervals. This synthetic dataset includes rationales and evidence spans that mimic real clinical documentation patterns.
The researchers then fine-tuned several open-weight language models (ranging from 4B to 14B parameters) exclusively on this synthetic data. When tested on a clinician-verified set of real epilepsy clinic letters, models achieved impressive micro-F1 scores of up to 0.788 for fine-grained seizure categories and 0.858 for pragmatic clinical categories. Notably, a medically oriented 4B parameter model performed nearly as well as larger models, demonstrating efficient specialization. The structured label prediction approach consistently outperformed direct numeric regression, and evidence-grounded outputs enabled rapid clinical verification.
This work demonstrates that synthetic, structured, evidence-grounded supervision can enable robust clinical information extraction without sharing sensitive patient text. The framework shows particular promise for extracting temporally complex clinical data and could generalize to other medical domains where free-text documentation presents annotation challenges. The privacy-preserving approach addresses significant barriers in medical AI development while maintaining clinical utility.
- Generated 15,000 synthetic NHS-style clinic letters using a teacher language model, creating privacy-preserving training data
- Fine-tuned open-weight models (4B-14B parameters) achieved up to 0.858 F1 scores on real letters using only synthetic training
- Structured label prediction outperformed direct numeric regression, with evidence-grounded outputs supporting clinical verification
Why It Matters
Enables medical AI development without sharing sensitive patient data, potentially accelerating epilepsy research and clinical care.