There are more AI health tools than ever—but how well do they work?
Major tech firms are releasing AI health chatbots, but a new study reveals critical safety gaps in triage advice.
Microsoft, Amazon, and OpenAI are rapidly deploying AI-powered health tools for consumers, capitalizing on massive user demand. Microsoft's new Copilot Health app and Amazon's now-public Health AI tool join OpenAI's ChatGPT Health and Anthropic's health-record-enabled Claude. The push is driven by data: Microsoft reports its Copilot app fields 50 million health-related questions daily, making it the platform's most popular topic. Developers cite both improved LLM capabilities and critical gaps in traditional healthcare access as key reasons for the rollout.
Despite the promise, significant concerns about safety and efficacy persist. A recent study from researchers at Mount Sinai Health System found that ChatGPT Health demonstrated flawed triage logic, sometimes failing to identify emergencies while over-recommending care for mild conditions. This highlights a central tension: while companies like OpenAI conduct internal evaluations, experts argue for mandatory, independent review before public release in such a high-stakes domain. The vision of AI chatbots reducing healthcare system strain hinges on their accuracy, which current evidence suggests is not yet assured.
- Microsoft's Copilot Health app fields 50 million health questions daily, driving the rush to market.
- A Mount Sinai study found ChatGPT Health has critical triage flaws, missing emergencies and over-prescribing care.
- Experts demand mandatory independent evaluation before wide release, citing high stakes and potential corporate blind spots.
Why It Matters
Widespread deployment of unvetted AI health advisors could lead to misdiagnosis and increased strain on emergency services.