Media & Culture

AI Just Beat Doctors at Diagnosing ER Patients. Don’t Get All Excited

Harvard study shows OpenAI's o1-preview scores 67.1% accuracy vs 55.3% for physicians

Deep Dive

Researchers at Harvard Medical School and Beth Israel Deaconess Medical Center tested OpenAI’s o1-preview reasoning model against two attending physicians in diagnosing emergency room patients. The AI achieved 67.1% accuracy across 76 real ER cases, outperforming the physicians’ 55.3% and 50.0% scores. When evaluated on 143 complex cases from *The New England Journal of Medicine*, o1-preview included the correct diagnosis in 78.3% of cases and suggested a helpful differential diagnosis in 97.9% of cases, surpassing ChatGPT-4 and a human baseline of 44.5% accuracy from a *Nature* study.

The team stresses that AI is not meant to replace doctors but to augment their work, with clinicians retaining oversight and accountability. Study coauthor Adam Rodman compared AI’s role to existing clinical decision support tools, noting that robust evidence—such as randomized controlled trials—would be required before widespread adoption. However, o1-preview still struggles with multimodal inputs like medical imaging, where human doctors excel, highlighting an active area for future research.

Key Points
  • OpenAI’s o1-preview achieved 67.1% diagnostic accuracy vs. 55.3% and 50.0% for two physicians in a Harvard/Beth Israel study of 76 ER cases.
  • The model included correct diagnoses in 78.3% of 143 complex cases and proposed helpful differentials in 97.9% of cases.
  • Researchers emphasize AI as a collaborative tool, citing legal and regulatory hurdles for full adoption without rigorous testing.

Why It Matters

AI could reshape emergency medicine by augmenting diagnostics, but regulatory and multimodal challenges remain before widespread clinical use.