Research & Papers

PND framework slashes hallucination in VLMs without retraining

Training-free decoding technique boosts visual grounding by up to 15% on standard benchmarks.

Deep Dive

Object hallucination—where models describe objects not present in an image—remains a persistent flaw in Vision-Language Models (VLMs). A new paper from researchers at Beihang University, accepted at CVPR 2026, diagnoses the root cause as an attention imbalance: visual features are consistently under-weighted compared to linguistic priors. To fix this, they propose Positive-and-Negative Decoding (PND), a training-free inference framework that directly intervenes during the decoding step. PND creates two parallel decoding paths: a positive path that amplifies visual signals from the encoder, and a negative path that constructs counterfactual outputs biased toward language priors. By contrasting logits from both paths, PND steers generation toward visually grounded results.

Experiments show PND achieves state-of-the-art performance across three standard hallucination benchmarks—POPE (object existence), MME (perception), and CHAIR (caption hallucination)—without requiring any retraining or fine-tuning. The method is model-agnostic and can be applied to any autoregressive VLM. The authors also release code on GitHub. For enterprise teams deploying VLMs in high-stakes settings like medical imaging or autonomous driving, PND offers a lightweight, drop-in solution to drastically reduce hallucination rates while preserving model capabilities. The paper is 11 pages with 5 figures and is available on arXiv (2605.06679).

Key Points
  • PND is a training-free decoding method that fixes hallucination by contrasting positive (visual-amplified) and negative (prior-dominant) paths.
  • Achieves state-of-the-art results on POPE, MME, and CHAIR benchmarks without retraining.
  • Accepted at CVPR 2026; code is open-source and model-agnostic for any VLM.

Why It Matters

A free, drop-in fix for VLM hallucination makes AI vision systems more reliable for real-world use.