AI Safety

PictoPercept toolkit reveals how GPT-5 amplifies human bias in earnings perceptions

Open-source tool compares human and GPT-5 bias against real labor data—results are stark.

Deep Dive

A team of 15 researchers from multiple institutions has unveiled PictoPercept, an open-source toolkit designed to measure bias in both humans and AI systems using visual forced-choice comparisons. Participants view pairs of normed facial photographs and judge who is more likely to have higher earnings—their selections are then compared against actual U.S. Bureau of Labor Statistics data. The tool was validated with a nationally representative sample of 283 American adults and also applied to OpenAI's GPT-5 model using identical stimuli. This approach addresses limitations of traditional bias measurement tools, which struggle to capture intersectional identities, cannot evaluate AI systems, lack grounding in demographic reality, and remain vulnerable to social desirability effects.

The study's findings reveal systematic misperceptions diverging from demographic reality. Participants dramatically underestimated Asian American earnings despite this group having the highest actual earnings, while overestimating Latino male and White male earnings. Notably, ingroup favoritism was not universal: White males showed clear ingroup bias, but Asian participants actually underestimated their own group's earnings. Critically, GPT-5 exhibited substantially stronger biases than humans, with stark systematic underestimation of all female groups. These results suggest that PictoPercept enables unified bias assessment across human and AI systems, providing a grounded, empirical method for uncovering harmful stereotypes. The paper has been accepted for presentation at the 76th Annual International Communication Association Conference in Cape Town.

Key Points
  • PictoPercept compares perceptions against actual U.S. Bureau of Labor Statistics data using facial photograph pairs.
  • 283 American adults tested; GPT-5 showed stronger biases than humans, systematically underestimating all female groups.
  • Asian American earnings were dramatically underestimated despite being the highest actual earners; Asian participants also underestimated their own group.

Why It Matters

A standardized, open-source benchmark for auditing bias in both human decision-making and AI models like GPT-5.

📬 Get the top 10 AI stories daily