AI Safety

The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission

arXiv cs.CY May 05, 2026

⚡Top LLMs make identical mistakes on 568 predictions—collective intelligence may be an illusion.

Deep Dive

A new study by Theodor Spiro, published on arXiv, tests whether large language models (LLMs) form an 'epistemic monoculture'—where individual model errors are no longer independent, undermining the foundation of collective intelligence. In Study 1, the author evaluated GPT-4o, Claude, and Gemini on 568 resolved binary prediction questions and found a mean pairwise error correlation of r = 0.77 (p < 0.001), which remained at r = 0.78 even after excluding questions that may have leaked into training data. This indicates that three independently developed frontier models share strikingly similar failure modes.

Study 2 examined whether this correlated bias has propagated into human crowd forecasts, using a within-question design that tracked community prediction shifts across the ChatGPT launch in November 2022. The results showed that community forecasts moved in the direction predicted by LLMs (r = 0.20, p = 0.007), but this shift was fully explained by rational updating toward ground truth—not by AI influence. Study 3 looked at category-level patterns of human forecasting errors and found that pre-ChatGPT human biases already strongly resembled the LLM bias fingerprint (r = 0.87). Surprisingly, post-ChatGPT the resemblance weakened (r = -0.28). Together, these findings reveal an epistemic monoculture that is 'built but not yet activated': AI systems amplify the very biases humans already hold, creating a dangerous feedback loop for reliance on AI forecasts.

Key Points

Three major LLMs (GPT-4o, Claude, Gemini) show a mean error correlation of r=0.77 on 568 binary forecasting questions, indicating nearly identical failure modes.
Human crowd forecasts shifted in LLM-predicted directions after ChatGPT's launch, but the shift was entirely due to rational updating, not AI bias transmission.
Pre-ChatGPT human biases already matched the LLM pattern (r=0.87), while post-ChatGPT the resemblance weakened (r=-0.28), suggesting AI mirrors existing human blindspots.

Why It Matters

Shared AI blindspots could undermine ensemble forecasting and amplify systemic errors, making AI less reliable for critical decisions.

Read Original Article

The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission

Why It Matters

Stay Ahead in AI