Research & Papers

M3 lets researchers query clinical databases in plain English with 94% accuracy

Ask MIMIC-IV in English: M3 achieves 94% accuracy, even locally with open models.

Deep Dive

Large clinical databases like MIMIC-IV hold massive research potential but traditionally require both SQL expertise and clinical domain knowledge, creating a steep barrier for many researchers. M3, developed by Attrach et al., solves this by enabling natural language interfaces to the database through the Model Context Protocol. With a single command, researchers can download MIMIC-IV, spin up a local SQLite instance (or connect to BigQuery), and ask clinical questions in plain English. The system translates those questions into SQL, executes them, and returns structured results alongside the generated query for transparency.

In evaluations using the EHRSQL 2024 benchmark, Claude Sonnet 4 achieved 94% accuracy on 100 answerable questions, while the smaller open-weights gpt-oss-20B model (deployable on consumer hardware) scored 93%. On unanswerable questions, gpt-oss-20B correctly abstained 69% of the time—a critical feature for clinical settings where hallucinated answers could be dangerous. Error analysis showed most failures came from complex temporal reasoning or ambiguous phrasing, not model architecture. M3's local deployment option lets institutions analyze sensitive patient data without sending it to external APIs, backed by security measures including OAuth2 authentication, query validation, and audit logging.

Key Points
  • Claude Sonnet 4 achieved 94% accuracy on answerable clinical queries; open-weights gpt-oss-20B reached 93% and correctly abstained 69% on unanswerable questions.
  • M3 uses the Model Context Protocol to let researchers query MIMIC-IV in natural language, translating to SQL and executing against SQLite or BigQuery.
  • Local deployment of the open model enables privacy-preserving analysis with OAuth2, query validation, and audit logging built in.

Why It Matters

M3 lowers the technical barrier to clinical data analysis while maintaining security, enabling wider use of EHR databases for critical care research.