AI Safety

CelebA audit reveals gendered double standards in facial AI datasets

AI models penalize women for ageing while excluding older men entirely

Deep Dive

A new study by Sieun Park and Yuanmo He (arXiv:2605.15312) provides a three-level audit of representational harm in the widely used CelebA facial dataset. The researchers analyzed 202,599 images and 39 attributes using hierarchical clustering, XGBoost with SHAP, and Grad-CAM attention maps. They found that the dataset's labels organize into cultural archetypes: "performative femininity" (youth, makeup, adornment) and "professional masculinity" (ageing, facial hair, formal attire). Female faces are more often rated attractive overall, but incur steep penalties when assigned to ageing or masculine-coded clusters. For example, adiposity (fatness) reduces attractiveness only for females.

At the model level, Grad-CAM analysis showed that predictions for female and younger male subgroups focus on mid-face cues (eyes, nose, mouth), while predictions for older males drift toward peripheral cues like hair and clothing. Older males achieve the highest accuracy but the lowest average precision, indicating categorical exclusion from the dataset's evaluative templates. The authors argue that standard fairness metrics focused on performance disparities mask these representational harms: hyper-scrutiny of women under a narrow template and exclusion of older men entirely. The paper calls for fairness research to address representational harm beyond simple accuracy disparities.

Key Points
  • Hierarchical clustering of 202,599 CelebA images revealed latent trait bundles: 'performative femininity' (youth, makeup) and 'professional masculinity' (ageing, facial hair)
  • SHAP analysis showed gender-specific effects: adiposity reduces attractiveness only for females, while older males are excluded from the evaluative template entirely
  • Grad-CAM attention maps found model focus shifts from mid-face (females/young males) to peripheral cues (older males), causing high accuracy but low precision for older men

Why It Matters

Fairness audits must move beyond accuracy gaps to catch hidden representational harms in facial AI datasets.