Egocentric Bias in Vision-Language Models
A new study finds 75% of vision-language model errors show an 'egocentric bias,' defaulting to their own viewpoint.
Researchers led by Dezhi Luo introduced FlipSet, a diagnostic benchmark for Level-2 visual perspective taking (L2 VPT). Testing 103 vision-language models (VLMs) revealed a systematic egocentric bias, with the vast majority performing below chance. Roughly three-quarters of errors reproduced the model's own camera viewpoint, exposing a fundamental compositional deficit in binding social awareness to spatial operations, despite high accuracy on isolated tasks.
Why It Matters
This reveals a core limitation in AI's social reasoning, impacting applications in robotics, assistive tech, and human-AI interaction.