Research & Papers

EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning

New AI system automates medical literature review with 82% accuracy, cutting manual screening time.

Deep Dive

Researchers Zhyar Rostam and Gábor Kertész have published a novel AI methodology that dramatically improves automated screening of medical literature for EQ-5D studies. The EQ-5D is a critical standardized instrument for measuring health-related quality of life in health economics, but identifying relevant studies among thousands of publications has traditionally required tedious manual review. Their approach addresses this bottleneck by fine-tuning pre-trained language models—including general-purpose BERT and domain-specific SciBERT and BioBERT—and enriching them with biomedical entities extracted using scispaCy models. This combination allows the AI to better understand medical context and terminology.

The team conducted nine experimental setups, combining three scispaCy models with three PLMs, and implemented a Multiple Instance Learning (MIL) framework with attention pooling to aggregate sentence-level predictions into study-level classifications. The results show consistent improvements, reaching an 82% F1-score and nearly perfect recall at the study level, significantly exceeding both classical bag-of-words baselines and recently reported PLM benchmarks. This demonstrates that entity enrichment provides crucial domain adaptation, enabling more accurate automated screening that could save researchers hundreds of hours in systematic literature reviews while improving consistency and reducing human error.

Key Points
  • Achieved 82% F1-score and near-perfect recall for detecting EQ-5D studies in medical abstracts
  • Combined biomedical entity enrichment (via scispaCy) with PLMs like BERT, SciBERT, and BioBERT across nine experimental setups
  • Used Multiple Instance Learning with attention pooling to aggregate sentence-level data into study-level predictions

Why It Matters

Automates tedious medical literature screening, saving researchers time and improving accuracy in health economics reviews.