Research & Papers

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

New medical AI system achieves 88% accuracy on pediatric neurology cases by separating reasoning from language models.

Deep Dive

A research team led by Isaac Henry and Avery Byrne has introduced SymptomWise, a novel AI framework designed to tackle the persistent issues of reliability and hallucination in AI-driven diagnostic systems. Published on arXiv, the paper proposes a deterministic reasoning layer that fundamentally separates the language understanding capabilities of large language models (LLMs) from the core diagnostic inference process. Instead of relying on end-to-end generative models, SymptomWise uses LLMs only for initial symptom extraction from free text. The extracted symptoms are then mapped to a validated, expert-curated medical knowledge base.

This mapped data is processed by a deterministic reasoning module that operates over a finite hypothesis space, generating a ranked differential diagnosis through codex-driven inference. This architecture ensures every conclusion is traceable back to the underlying knowledge base, dramatically reducing unsupported or 'hallucinated' outputs. In a preliminary evaluation on 42 expert-authored, challenging pediatric neurology cases, the system demonstrated strong performance, with the correct diagnosis appearing in its top five differentials 88% of the time.

The researchers argue that this separation of concerns—using LLMs for what they're good at (language) and deterministic systems for logical reasoning—creates a more reliable and auditable AI. Beyond medicine, the SymptomWise framework is presented as a generalizable pattern for adding a deterministic structuring and routing layer to foundation models. This could improve precision and potentially reduce unnecessary computational overhead in other bounded, abductive reasoning tasks where safety and traceability are paramount.

Key Points
  • Separates language models from diagnostic reasoning, using LLMs only for symptom extraction and explanation, not for inference.
  • Achieved 88% accuracy (correct diagnosis in top 5) on 42 challenging pediatric neurology cases in preliminary evaluation.
  • Provides a deterministic, traceable framework that reduces hallucinations and can generalize to other safety-critical domains beyond medicine.

Why It Matters

Offers a blueprint for building more reliable, auditable AI systems in healthcare and other high-stakes fields by curbing model hallucinations.