Mitigating hallucination [P]
Selective contrastive training reduces factuality errors by 6% over DPO with less data.
A Reddit user (Round_Apple2573) has introduced a lightweight method to mitigate hallucinations in large language models (LLMs) without relying on external judges, extra human labels, or heavy preference-learning pipelines. The core idea: let a frozen base model generate a 'bad' counterfactual answer, then train the adapted model to contrast the correct answer against that bad branch only from the first point where they diverge. Instead of updating on every sample, the method self-selects cases where the bad continuation still receives too much support from the model.
In practice, only about 10% of training examples trigger updates, yet the model still improves factuality over standard cross-entropy (CE) training and DPO-style baselines. Compared to DPO, it showed about a 6% decrease in hallucination rate; compared to SFT, about a 1% decrease—both using only ~10% of the dataset while DPO and SFT used full datasets. The method also maintained consistent gains on out-of-distribution (OOD) datasets, suggesting it generalizes well beyond training benchmarks. The code is available on GitHub under genji970/hallucination-mitigation-via-contrastive-sampling-method.
- Only ~10% of training examples trigger updates, reducing compute needs by 90%
- Outperforms DPO by 6% and SFT by 1% in factuality using less data
- Gains hold on out-of-distribution datasets, not just training benchmarks
Why It Matters
A data-efficient, compute-light approach to reduce hallucinations could make LLMs more reliable for enterprise deployment.