SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems
New medical AI system achieves 88% accuracy on pediatric neurology cases by separating reasoning from language models.
A research team led by Isaac Henry and Avery Byrne has introduced SymptomWise, a novel AI framework designed to tackle the persistent issues of reliability and hallucination in AI-driven diagnostic systems. Published on arXiv, the paper proposes a deterministic reasoning layer that fundamentally separates the language understanding capabilities of large language models (LLMs) from the core diagnostic inference process. Instead of relying on end-to-end generative models, SymptomWise uses LLMs only for initial symptom extraction from free text. The extracted symptoms are then mapped to a validated, expert-curated medical knowledge base.
This mapped data is processed by a deterministic reasoning module that operates over a finite hypothesis space, generating a ranked differential diagnosis through codex-driven inference. This architecture ensures every conclusion is traceable back to the underlying knowledge base, dramatically reducing unsupported or 'hallucinated' outputs. In a preliminary evaluation on 42 expert-authored, challenging pediatric neurology cases, the system demonstrated strong performance, with the correct diagnosis appearing in its top five differentials 88% of the time.
The researchers argue that this separation of concerns—using LLMs for what they're good at (language) and deterministic systems for logical reasoning—creates a more reliable and auditable AI. Beyond medicine, the SymptomWise framework is presented as a generalizable pattern for adding a deterministic structuring and routing layer to foundation models. This could improve precision and potentially reduce unnecessary computational overhead in other bounded, abductive reasoning tasks where safety and traceability are paramount.
- Separates language models from diagnostic reasoning, using LLMs only for symptom extraction and explanation, not for inference.
- Achieved 88% accuracy (correct diagnosis in top 5) on 42 challenging pediatric neurology cases in preliminary evaluation.
- Provides a deterministic, traceable framework that reduces hallucinations and can generalize to other safety-critical domains beyond medicine.
Why It Matters
Offers a blueprint for building more reliable, auditable AI systems in healthcare and other high-stakes fields by curbing model hallucinations.