AI Safety

Surgeon warns AI erodes scientific self-correction with fabricated references

28.6%–91.4% of LLM-generated references are fabricated, says new study.

Deep Dive

Pediatric surgeon Tuyen Tran proposes a troubling hypothesis: science's self-correction mechanism is eroding in the age of AI. In a preprint published on OSF, he identifies four conditions that historically enabled evidence-based medicine—independent evaluation (peer review), methodological plurality, traceability, and epistemic friction—and shows how each is being undermined. AI dramatically reduces the workload for synthesizing research (months compressed into days), making it harder to audit conclusions. The opacity of LLM 'reasoning' breaks traceability, while a growing 'monoculture' of AI-assisted methods reduces plurality. Most critically, the independence between those who generate research and those who evaluate it collapses when both use the same AI tools.

Empirical signals back the claim: 28.6%–91.4% of LLM-generated references in systematic-review assistance are fabricated; only 6% of published AI models in paediatric surgery are both interpretable and externally validated; and an audit of 2,271 evidence syntheses (2017–2024) documents automation spreading across search, screening, and extraction. Tran terms this syndrome 'epistemic immunodepression'—a passive weakening through scale, opacity, and lost independence. Current governance cannot detect structural failure modes. He calls for verifiable fixes: a research record, an AI logbook, evidence-pyramid recalibration, and peer-review AI accountability. The stakes are stark: a journal can retract a paper, but a surgeon cannot reverse a decision already executed on a child.

Key Points
  • 28.6%–91.4% of LLM-generated references in systematic-review assistance are fabricated.
  • Only 6% of published AI models in pediatric surgery are both interpretable and externally validated.
  • An audit of 2,271 evidence syntheses (2017–2024) shows automation spreading across search, screening, and extraction.

Why It Matters

Eroding scientific self-correction risks irreversible harm—especially in surgery where retractions can't undo decisions on patients.