Research & Papers

Why Someone Asked "Why": Foil Inference in Human and LLM Question Interpretation

New study reveals how we interpret 'why'—and where AI still falls short.

Deep Dive

Explanations are inherently contrastive: we explain why event E happened rather than some alternative E'. That alternative, or 'foil,' is rarely stated explicitly. A new study by Besch and Gerstenberg explores how humans infer the intended foil when someone asks 'why.' Participants read short stories and judged multiple possible foils on three dimensions: prior expectation (what they thought would happen), closeness (similarity to the actual outcome), and hindsight expectation (what could have happened instead). They also selected which foil they believed the question asker had in mind. The results were clear: hindsight expectation judgments best predicted human foil selection. This suggests that when we hear a 'why' question, we imagine what the asker finds surprising after the fact—not what they expected beforehand or what seems similar.

The study also tested several large language models (including GPT-4 and variants) on the same task. While LLMs produced reasonable expectation judgments, their foil selections were not consistently aligned with those judgments. In other words, they failed to reliably link what they 'think' could have happened to the actual foil implied by the question. This inconsistency matters because as LLMs are increasingly used in conversational AI, they must infer unstated contrasts to give satisfying explanations. The findings suggest current models lack a key component of human pragmatic reasoning: understanding that 'why' implicitly asks about a specific alternative that the asker finds surprising.

Key Points
  • Foil inference is crucial for interpreting 'why' questions: people assume an unspoken alternative event.
  • Hindsight expectation (what could have happened) best predicts human foil selection—not prior expectation or similarity.
  • LLMs show inconsistent mapping between their own expectation judgments and the inferred foil, revealing a gap in pragmatic reasoning.

Why It Matters

Improving AI's ability to infer unspoken contrasts is essential for natural, satisfying dialogue in assistants and chatbots.