AI Safety

In your own words: computationally identifying interpretable themes in free-text survey data

arXiv cs.CY March 31, 2026

⚡New computational method identifies nuanced themes like 'belonging' and 'fluidity' in free-text survey responses.

Deep Dive

A team of Stanford University researchers—Jenny S. Wang, Aliya Saperstein, and Emma Pierson—has developed a new AI-powered framework called 'In Your Own Words' to tackle a persistent problem in social science: analyzing free-text survey responses. Traditional computational methods often miss nuance, but this framework uses advanced natural language processing (NLP) to identify structured, interpretable themes directly from open-ended answers. It moves beyond simple keyword matching to surface complex, human-understandable concepts, offering a systematic complement to manual qualitative coding.

The researchers validated their method on a novel dataset of 1,004 U.S. participants' written descriptions of their race, gender, and sexual orientation. The analysis revealed themes like 'belonging' and 'identity fluidity' that standard survey questions fail to capture. These themes have three key applications: they can inform the design of better structured questions for future surveys, explain additional variation in outcomes like health and well-being within broad demographic categories, and illuminate systematic discordance between how people self-identify and how they are perceived by others.

More broadly, the 'In Your Own Words' framework is designed for deployment across a wide range of survey settings, from market research to public policy. By providing a scalable, precise tool for exploratory text analysis, it empowers researchers to derive richer, more actionable insights from the qualitative data that respondents provide in their own words, bridging a critical gap between quantitative scale and qualitative depth.

Key Points

Identifies interpretable themes like 'belonging' & 'fluidity' from free-text survey data more precisely than previous computational methods.
Validated on a new dataset of 1,004 U.S. participants' descriptions of race, gender, and sexual orientation.
Reveals hidden heterogeneity within standard categories and systematic misrecognition between self-identified and perceived identities.

Why It Matters

Enables scalable, nuanced analysis of open-ended survey responses, transforming qualitative insights into actionable data for researchers and policymakers.

Read Original Article

In your own words: computationally identifying interpretable themes in free-text survey data

Why It Matters

Stay Ahead in AI