Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes
Even with names removed, AI models favor Chinese and Caucasian male candidates based on hobbies and languages.
A research team led by Bryan Chen Zhengyu Tan has published a groundbreaking study demonstrating that Large Language Models (LLMs) used in hiring perpetuate demographic bias even when resumes are anonymized. The researchers created a novel stress-test framework using the Singapore context, taking 100 neutral job-aligned resumes and generating 4,100 variants that differed only in subtle sociocultural markers like languages spoken, co-curricular activities, volunteering, and hobbies. These markers served as proxies for four ethnicities and two genders, allowing the team to systematically evaluate bias in 18 different LLMs across realistic screening scenarios.
In both direct comparison and top-scoring shortlist settings, the models consistently recovered demographic attributes with high accuracy (measured by F1 scores) and exhibited systematic disparities. Markers associated with Chinese and Caucasian males received preferential treatment. Crucially, the study found that language markers alone were sufficient for ethnicity inference, while gender bias relied primarily on hobbies and activities. Perhaps most concerning was the finding that prompting models for explanations of their decisions tended to amplify rather than reduce bias. This research fundamentally challenges the effectiveness of current anonymization practices in automated hiring pipelines.
- Tested 18 LLMs on 4,100 resume variants with only subtle sociocultural markers changed
- Models achieved high accuracy in inferring ethnicity (via language) and gender (via hobbies/activities)
- Systematic bias favored markers associated with Chinese and Caucasian male candidates
Why It Matters
Current resume anonymization is ineffective; companies using AI for hiring may face legal risks and miss qualified talent.