Research & Papers

Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling

A new AI system transforms messy medical narratives into structured timelines for risk analysis.

Deep Dive

Researchers have demonstrated a novel application of large language models (LLMs) to solve a critical problem in medical informatics: extracting structured timelines from the complex, narrative prose of clinical case reports. The team, led by Sayantan Kumar and Jeremy C. Weiss, created a new textual time-series corpus from 136 PubMed Open Access case reports involving GLP-1 receptor agonists (a common diabetes/weight-loss drug class). They then evaluated several LLMs on their ability to identify clinical events—like symptoms, diagnoses, and treatments—and pin them to their most probable reference times, comparing the AI's output to timelines painstakingly annotated by clinical experts.

The results were striking. The best-performing model, referred to as GPT-5, achieved high scores in both event coverage (0.871) and the correct sequencing of those events over time (0.843). This successful temporal phenotyping—turning narrative into structured data—unlocked a powerful downstream application. The researchers used the extracted timelines to perform a formal time-to-event (survival) analysis. This analysis suggested that patients using GLP-1RAs had a significantly lower risk of developing respiratory sequelae compared to non-users, with a hazard ratio (HR) of 0.259. This finding aligns with emerging reports of the drugs' benefits beyond metabolic health.

This research, presented for the AMIA Annual Symposium, provides a concrete blueprint for using advanced LLMs as sophisticated information extraction tools in biomedicine. By converting 'doctor's notes' into analyzable time-series data, the method opens the door to large-scale, retrospective studies using the vast trove of existing medical literature. The team plans to release their temporal annotations and code upon publication, providing a valuable resource for the computational medicine community.

Key Points
  • GPT-5 achieved 0.871 event coverage and 0.843 temporal sequencing accuracy on 136 diabetes drug case reports.
  • The extracted timelines enabled a survival analysis showing GLP-1RA users had a 74% lower risk (HR=0.259) of respiratory issues.
  • The created corpus and method turn unstructured clinical narratives into structured data for longitudinal modeling and retrospective studies.

Why It Matters

This proves LLMs can automate the conversion of medical narratives into analyzable data, accelerating drug safety research and retrospective studies.