Research & Papers

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

Researchers used phone surveys in 15 native languages to collect 3,534 English and 3,206 local stereotypes.

Deep Dive

Google researchers have introduced SAFARI, a groundbreaking dataset designed to address critical gaps in AI safety testing for underrepresented regions. The team, including Aishwarya Verma, Vinodkumar Prabhakaran, and four collaborators, developed a community-engaged methodology using telephonic surveys moderated in native languages to collect stereotype data from four severely underrepresented sub-Saharan African countries: Ghana, Kenya, Nigeria, and South Africa. This approach represents a strategic shift from simply increasing data volume to targeted expansion that addresses existing deficits in global AI safety resources, where African contexts have been notably absent from stereotype repositories used to assess models like GPT-4o and Llama 3.

The resulting dataset contains 3,534 stereotypes in English and 3,206 stereotypes across 15 native languages, deliberately balanced across diverse ethnic and demographic backgrounds. By employing socioculturally-situated methods sensitive to the region's complex linguistic diversity and traditional orality, the researchers established a reproducible methodology that could be applied to other underrepresented regions. This work enables AI developers to systematically test for and mitigate harmful biases in generative AI models, moving beyond Western-centric safety evaluations. The dataset's release on arXiv (2602.22404) provides a crucial resource for improving AI fairness globally, particularly as companies like Anthropic and OpenAI expand their international user bases.

Key Points
  • Community-engaged methodology using telephonic surveys in 15 native languages across Ghana, Kenya, Nigeria, and South Africa
  • Dataset contains 6,740 total stereotypes (3,534 English + 3,206 across local languages) with balanced ethnic/demographic coverage
  • Provides reproducible framework for expanding AI safety testing beyond Western contexts to address global bias gaps

Why It Matters

Enables proper testing of AI bias in African contexts, crucial as global AI adoption expands beyond Western markets.