KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning
A new RL method teaches LLMs when to say 'I don't know' without sacrificing accuracy.
Researchers from Tsinghua University have developed KARL (Knowledge-Boundary-Aware Reinforcement Learning), a novel framework designed to reduce hallucinations in large language models (LLMs) by teaching them when to appropriately abstain from answering. Existing reinforcement learning methods often push models toward excessive caution, hurting answer accuracy because their static reward mechanisms are agnostic to the model's actual knowledge limits. KARL addresses this by continuously aligning an LLM's abstention behavior with its evolving knowledge boundary, using two core innovations.
The first innovation is a Knowledge-Boundary-Aware Reward that performs online estimation of the model's knowledge boundary using within-group response statistics. This dynamically rewards correct answers or guided abstention, rather than applying a fixed penalty. The second is a Two-Stage RL Training Strategy: the first stage explores the knowledge boundary and bypasses the "abstention trap" (where models over-abstain), while the second converts incorrect answers beyond the knowledge boundary into abstentions without sacrificing accuracy. Extensive experiments show KARL achieves a superior accuracy-hallucination trade-off across both in-distribution and out-of-distribution scenarios, effectively suppressing hallucinations while maintaining high accuracy.
- KARL uses Knowledge-Boundary-Aware Reward to dynamically estimate LLM knowledge limits via within-group response statistics, avoiding static penalties.
- The Two-Stage RL Training Strategy first explores the knowledge boundary to bypass the 'abstention trap,' then converts incorrect answers into abstentions.
- Experiments show KARL achieves a superior accuracy-hallucination trade-off across multiple benchmarks, including out-of-distribution scenarios.
Why It Matters
KARL's dynamic abstention mechanism could make LLMs safer for deployment in critical applications like healthcare and law.