Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA
A new framework achieves 100% accuracy on temporal arithmetic by fixing text-to-event conversion, not reasoning.
A new preprint from Tran Quang Liem overturns a common assumption in AI: that large language models (LLMs) struggle with temporal reasoning due to inherent limits in logical deduction. Instead, the paper argues the real bottleneck is converting unstructured text into structured event representations. The proposed framework, driven by a Probabilistic Inconsistency Signal (PIS), explicitly separates semantic extraction from symbolic reasoning. It lifts text into explicit event graphs with interval constraints, then uses Evidential Deep Learning on LLM hidden states to detect structural inconsistencies.
Empirical results are striking: when given correct structural representations, the system achieves perfect 1.0 accuracy (4000/4000) on temporal arithmetic benchmarks with zero false positives or negatives. On broader noise-injected QA tasks, it maintains 75.1% accuracy while enabling deterministic step-level failure localization. This work reframes temporal QA from a reasoning challenge to a structural alignment problem, offering a verifiable path for reliable neuro-symbolic AI.
- Argues temporal reasoning is not the core bottleneck; instead, the flaw is in unstructured text-to-event conversion.
- Introduces a Probabilistic Inconsistency Signal (PIS) that uses Evidential Deep Learning on LLM hidden states to separate perception from reasoning errors.
- Achieves perfect 1.0 accuracy (4000/4000) on temporal arithmetic benchmarks and 75.1% on noise-injected QA with deterministic failure localization.
Why It Matters
Reframes temporal QA from a reasoning challenge to a structural alignment problem, paving the way for reliable neuro-symbolic AI systems.