Research & Papers

Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering

arXiv cs.CL April 09, 2026

⚡A new decoding layer fixes LLMs' logical contradictions and reduces 'Unknown' answers with just 4-5 extra calls.

Deep Dive

A team of researchers has introduced a new method called Consistency-Guided Decoding with Proof-Driven Disambiguation (CGD-PD) to fix critical logical reasoning failures in large language models (LLMs). The work targets three-way logical question answering, where models must label a hypothesis as True, False, or Unknown based on a set of premises. The authors identified two persistent failure modes: negation inconsistency, where a model gives contradictory answers to a statement and its direct negation, and epistemic 'Unknown,' where a model defaults to 'Unknown' even when the premises logically entail a definitive answer.

CGD-PD acts as a lightweight layer applied during inference. It first queries a model on both a hypothesis and its mechanically negated form, then projects the results onto a negation-consistent decision. If the outcome is still 'Unknown,' it triggers a proof-driven disambiguation step. This step uses targeted binary entailment probes—asking the model simpler yes/no questions derived from the original logic—to resolve the ambiguity. Remarkably, this entire process requires an average of only 4 to 5 additional calls to the underlying LLM.

The results are significant. When tested on the first-order logic subsets of the FOLIO benchmark, CGD-PD delivered consistent accuracy gains across several frontier LLMs. It achieved relative accuracy improvements of up to 16% over the base models' vanilla performance. Furthermore, the method successfully reduced the number of unhelpful 'Unknown' predictions, pushing models toward more definitive and logically sound conclusions. This represents a major step in making LLMs more reliable for complex, structured reasoning tasks without expensive retraining.

Key Points

Fixes two key LLM logic failures: negation inconsistency and epistemic 'Unknown' predictions.
Adds only 4-5 model calls per query for proof-driven disambiguation, making it highly efficient.
Boosts accuracy by up to 16% on the FOLIO logic benchmark and reduces ambiguous outputs.

Why It Matters

This makes LLMs more reliable for critical reasoning in law, finance, and research without costly model retraining.

Read Original Article

Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering

Why It Matters

Stay Ahead in AI