Research & Papers

"I Don't Know" -- Towards Appropriate Trust with Certainty-Aware Retrieval Augmented Generation

New certainty-aware AI tells users when it's guessing—reducing blind trust by 40%.

Deep Dive

Large language models (LLMs) are infamous for overconfident answers, even when wrong. This erodes user trust—especially in high-stakes domains like medicine or law. The paper introduces CERTA (Certainty Enhanced RAG for Trustworthy Answers), a Retrieval Augmented Generation system that explicitly models uncertainty by assessing relevance between the question, retrieved context, and generated answer. Instead of always producing a firm response, CERTA can signal low confidence, effectively saying “I don’t know.” This aligns with the human value of benevolence—AI that is honest about its limitations fosters appropriate trust, neither blind acceptance nor total dismissal.

To evaluate CERTA, the team built the Certainty Benchmark: 90 non-objective question-context pairs spanning four categories—factuality, preference, sycophancy (agreeing with user bias), and morality—each paired with relevant, incomplete, or irrelevant contexts. Experiments with two different LLMs (names not disclosed) showed CERTA outperforms baseline RAG in identifying uncertain answers, reducing over-agreeing by a significant margin, and providing more cautious moral judgments. The work will appear at VALE 2025, signaling a shift toward explainable, uncertainty-aware AI systems that users can calibrate their trust against.

Key Points
  • CERTA introduces a relevance scoring mechanism between question, context, and answer to quantify uncertainty in RAG.
  • The Certainty Benchmark includes 90 question-context pairs across 4 categories (factuality, preference, sycophancy, morality) with 3 context types.
  • Experiments show CERTA reduces over-agreeing and flags uncertain answers, promoting appropriate trust in LLM outputs.

Why It Matters

By making AI honest about uncertainty, professionals can make better-informed decisions without blindly trusting outputs.