Research & Papers

Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

arXiv stat.ML April 27, 2026

⚡11 frontier models tested. Only 7% pass rate in batch generation.

Deep Dive

A new paper accepted to ACL 2026, titled "Large Language Models Are Bad Dice Players," presents the first large-scale audit of native probabilistic sampling in frontier LLMs. Researchers Minda Zhao, Yilun Du, and Mengyu Wang benchmarked 11 models across 15 statistical distributions using a dual-protocol design: Batch Generation (1,000 samples in one response) and Independent Requests (1,000 stateless calls). The results reveal a sharp asymmetry—batch generation achieved only a 7% median pass rate, while independent requests collapsed almost entirely, with 10 of 11 models failing every distribution. Sampling fidelity also degraded monotonically with distributional complexity and as the sampling horizon N increased.

These failures aren't just academic. The study demonstrates real-world impact: models couldn't enforce uniform answer-position constraints in multiple-choice question generation, and they systematically violated demographic targets in attribute-constrained text-to-image prompt synthesis. The authors conclude that current LLMs lack a functional internal sampler, making external tools necessary for any application requiring statistical guarantees—from Monte Carlo simulations to fairness-aware content generation. The paper highlights a critical blind spot as LLMs move from chat interfaces to integral components of stochastic pipelines and systems approaching general intelligence.

Key Points

Batch generation achieved only a 7% median pass rate across 11 models and 15 distributions
Independent request mode collapsed: 10 of 11 models failed ALL distributions tested
Failures propagate to real tasks like biased MCQ generation and skewed demographic prompts

Why It Matters

LLMs can't reliably generate random numbers, threatening fairness and accuracy in stochastic pipelines.

Read Original Article

Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

Why It Matters

Stay Ahead in AI