AI Safety

Stochastic Parrots or Singing in Harmony? Testing Five Leading LLMs for their Ability to Replicate a Human Survey with Synthetic Data

AI-generated survey responses from five top models produced 'harmonized' conventional wisdom, missing key human counterintuitions.

Deep Dive

A new study titled 'Stochastic Parrots or Singing in Harmony?' by researchers Jason Miklian, Kristian Hoelscher, and John E. Katsos has delivered a critical reality check for the burgeoning use of AI-generated synthetic data in research. The team tested five leading large language models—ChatGPT Thinking 5 Pro, Claude Sonnet 4.5 Pro plus Claude CoWork 1.123, Gemini Advanced 2.5 Pro, Incredible 1.0, and DeepSeek 3.2—against a real-world survey of 420 Silicon Valley coders and developers. Their key finding reveals that while these advanced AI agents can produce technically plausible and replicable data, they collectively fail to capture the nuanced, counterintuitive insights that made the original human survey valuable. Instead, the models tended to produce 'harmonized' outputs that parrot conventional wisdom, with the real human data often appearing as the statistical outlier.

The research carries profound implications for organizational research practice, where the use of synthetic respondents is rapidly scaling. The study found that deviations in the AI-generated responses grouped together across all models, suggesting a systemic limitation in their ability to model novel human social beliefs, especially in contexts lacking extensive prior documentation. The authors argue this demonstrates an increased capacity for models to 'sing in harmony' with each other rather than reveal new knowledge. Consequently, they propose that synthetic survey data should not be viewed as a substitute for rigorous human methods, but rather as a reliable tool for pre- or post-fieldwork to identify societal assumptions and conventional wisdoms. The paper calls for the development of robust validation protocols and reporting standards to govern the responsible use of synthetic data in future research.

Key Points
  • Tested five top LLMs (ChatGPT 5 Pro, Claude 4.5 Pro, Gemini 2.5 Pro, Incredible 1.0, DeepSeek 3.2) against a 420-person human survey.
  • AI-generated responses were technically plausible but missed key counterintuitive human insights, producing 'harmonized' conventional wisdom.
  • The study concludes synthetic data is not a valid substitute for human surveys but can be a tool for identifying societal assumptions.

Why It Matters

This research establishes critical guardrails for using AI in organizational studies, preventing over-reliance on synthetic data that lacks human nuance.