Hierarchical clustering of 202,599 CelebA images revealed latent trait bundles?

'performative femininity' (youth, makeup) and 'professional masculinity' (ageing, facial hair)

SHAP analysis showed gender-specific effects?

adiposity reduces attractiveness only for females, while older males are excluded from the evaluative template entirely

Grad-CAM attention maps found model focus shifts from mid-face (females/young males) to peripheral cues (older males), causing high accuracy but low precision for older men?

Grad-CAM attention maps found model focus shifts from mid-face (females/young males) to peripheral cues (older males), causing high accuracy but low precision for older men

AI Safety

CelebA audit reveals gendered double standards in facial AI datasets

arXiv cs.CY May 18, 2026

⚡AI models penalize women for ageing while excluding older men entirely

Deep Dive

A new study by Sieun Park and Yuanmo He (arXiv:2605.15312) provides a three-level audit of representational harm in the widely used CelebA facial dataset. The researchers analyzed 202,599 images and 39 attributes using hierarchical clustering, XGBoost with SHAP, and Grad-CAM attention maps. They found that the dataset's labels organize into cultural archetypes: "performative femininity" (youth, makeup, adornment) and "professional masculinity" (ageing, facial hair, formal attire). Female faces are more often rated attractive overall, but incur steep penalties when assigned to ageing or masculine-coded clusters. For example, adiposity (fatness) reduces attractiveness only for females.

At the model level, Grad-CAM analysis showed that predictions for female and younger male subgroups focus on mid-face cues (eyes, nose, mouth), while predictions for older males drift toward peripheral cues like hair and clothing. Older males achieve the highest accuracy but the lowest average precision, indicating categorical exclusion from the dataset's evaluative templates. The authors argue that standard fairness metrics focused on performance disparities mask these representational harms: hyper-scrutiny of women under a narrow template and exclusion of older men entirely. The paper calls for fairness research to address representational harm beyond simple accuracy disparities.

Key Points

Hierarchical clustering of 202,599 CelebA images revealed latent trait bundles: 'performative femininity' (youth, makeup) and 'professional masculinity' (ageing, facial hair)
SHAP analysis showed gender-specific effects: adiposity reduces attractiveness only for females, while older males are excluded from the evaluative template entirely
Grad-CAM attention maps found model focus shifts from mid-face (females/young males) to peripheral cues (older males), causing high accuracy but low precision for older men

Why It Matters

Fairness audits must move beyond accuracy gaps to catch hidden representational harms in facial AI datasets.

Read Original Article

CelebA audit reveals gendered double standards in facial AI datasets

Why It Matters

Related Articles

🚀 Stay Ahead in AI