AI Safety

Fairness gaps in AI drug discovery: new review calls for bias metrics

10-page paper reveals how dataset splits and reward design skew results across cancer types.

Deep Dive

Deep reinforcement learning (DRL) is revolutionizing de novo molecular design, but a new rapid evidence review from researchers at the University of Calgary and collaborators reveals a critical blind spot: fairness. The paper, accepted at IEEE COMPSAC 2026 and posted on arXiv, syntheses 10 pages of analysis on how dataset composition, reward design, and evaluation metrics can introduce bias against specific disease areas or chemotypes.

The authors examined three core questions: how dataset splitting strategies (scaffold vs. random) affect distribution shift, how reward functions like QED, docking scores, toxicity, and synthetic accessibility can inadvertently bias outputs toward well-studied cancer types, and which measurable metrics best capture fairness. They propose tracking parity across cancer vs. non-cancer indications, across cancer subtypes, and distributional balance in physicochemical descriptors and scaffold diversity. The review offers concrete guidance for reporting distribution parity and outcome parity, highlighting open gaps for trustworthy, cancer-relevant DRL generation.

Key Points
  • Dataset split choice (scaffold vs. random) significantly alters distribution shift and fairness across chemotypes.
  • Reward functions like QED and docking scores can create bias toward well-studied cancer targets, neglecting rare subtypes.
  • Proposed fairness metrics include groupwise validity, toxicity parity, and scaffold/chemotype diversity across disease domains.

Why It Matters

As DRL speeds drug discovery, unchecked bias could widen treatment gaps—this review gives teams a fairness toolkit.