Research & Papers

CTG-DB: An Ontology-Based Transformation of ClinicalTrials.gov to Enable Cross-Trial Drug Safety Analyses

New open-source pipeline standardizes 400,000+ trial records using MedDRA ontology to enable systematic pharmacovigilance.

Deep Dive

Researchers have developed CTG-DB, an open-source pipeline that addresses a critical bottleneck in drug safety research by transforming ClinicalTrials.gov's massive but unstructured registry into a standardized relational database. The system ingests the complete XML archive from the world's largest clinical trial registry and applies the Medical Dictionary for Regulatory Activities (MedDRA) ontology to normalize adverse event terminology through deterministic exact and fuzzy matching algorithms. This transformation preserves crucial trial metadata including arm-level denominators and placebo/comparator arm representations, creating a transparent and reproducible mapping framework.

By converting investigator-reported text into standardized identifiers, CTG-DB enables systematic pharmacovigilance analytics that were previously limited by ClinicalTrials.gov's registry-oriented architecture and heterogeneous terminology. The database supports concept-level retrieval and cross-trial aggregation, allowing researchers to perform scalable placebo-referenced safety analyses across hundreds of thousands of trials. This structured approach facilitates integration of clinical trial evidence into downstream pharmacovigilance signal detection systems, potentially accelerating drug safety monitoring and regulatory decision-making.

The framework represents a significant advancement in evidence-based medicine, providing a reproducible method for transforming real-world clinical trial data into analyzable formats. As an open-source tool, CTG-DB could democratize access to systematic drug safety analysis, enabling researchers, regulators, and pharmaceutical companies to conduct more comprehensive safety assessments without the manual reconciliation work that previously limited such analyses.

Key Points
  • Transforms ClinicalTrials.gov's unstructured XML archive into a MedDRA-aligned relational database using deterministic matching algorithms
  • Enables concept-level retrieval and cross-trial aggregation for systematic placebo-referenced safety analyses across 400,000+ trials
  • Open-source pipeline preserves arm-level denominators and comparator data for transparent, reproducible pharmacovigilance research

Why It Matters

Enables systematic AI analysis of drug safety signals across thousands of clinical trials, potentially accelerating pharmacovigilance and regulatory decisions.