New study reveals AI models trained on natural images best mimic human face perception
864 humans judged face similarities to expose what makes AI perceive faces like us.
A new study from Guo et al. tackles a core mystery in computational neuroscience: what mechanisms drive human face perception? The team compared six deep neural networks that shared the same architecture but were trained on distinct objectives—inverse rendering, face identification, object classification, and others. To expose subtle differences between these hypotheses, they created 'controversial' face pairs designed to elicit conflicting similarity predictions from the models, alongside random pairs. They then collected similarity judgments from 864 human participants using both photorealistic and synthetic stimuli.
The results were striking: models trained to prioritize high-level, invariant structures (inverse rendering, face identification, and object classification) aligned most closely with human judgments. Moreover, models trained on natural images consistently outperformed those trained on synthetic data. This implies that human face perception is shaped by mechanisms that discount nuisance variation (like lighting or pose), infer latent causes of facial appearance, and are finely tuned to the statistics of natural images. The work provides a rigorous framework for testing perceptual theories and offers clues for building more human-like AI vision systems.
- Compared 6 neural network models (same architecture, different training tasks) against 864 human participants.
- Used 'controversial' face pairs optimized to reveal diagnostic differences between model predictions.
- Best-performing models: inverse rendering, face identification, and object classification; natural image training outperformed synthetic.
Why It Matters
Insights for improving AI facial recognition systems and understanding the computational principles of human vision.