AI Safety

Guinness Statistician's Student's t-Test Corrects Small Sample Errors

90% confidence intervals from just 7 samples are 1.2x too narrow without correction.

Deep Dive

William Sealy Gosset, a statistician at Guinness in the early 1900s, revolutionized brewing by applying and inventing new statistical methods. Realizing that small sample sizes lead to underestimated confidence intervals when using the normal distribution, he developed the Student's t-distribution under the pseudonym "Student" to keep Guinness's competitive edge secret. His key insight: when you estimate the standard deviation from a small sample, you're uncertain about that estimate, so confidence intervals must be wider. For 90% confidence intervals with 2 samples, multiply the sample standard deviation by 4; with 3 samples, by 2; with 4, by 1.5; with 5, by 1.3; and with 6-8, by 1.2. Beyond 20 samples, the naïve normal assumption is sufficient.

For just two samples, the sample standard deviation formula severely underestimates the true spread. Using a t-score of 1.846 (for one standard deviation with n=2), the corrected estimate becomes approximately 1.3 times the difference between the two values. This sloppy but practical rule lets practitioners quickly gauge variability from minimal data. These corrections are essential for anyone analyzing small datasets, from A/B testing to quality control, ensuring that uncertainty is honestly reflected. Gosset's work remains a cornerstone of modern statistics, enabling reliable inference even when data is scarce.

Key Points
  • For 90% confidence intervals with small samples, multiply sample standard deviation by correction factor (e.g., 4x for n=2, 2x for n=3).
  • Student's t-distribution corrects for uncertainty in standard deviation estimation, derived by Gosset while brewing better Guinness beer.
  • With only two samples, a rough estimate of standard deviation equals 1.3 times the difference between the two values (after t-correction).

Why It Matters

Accurate confidence intervals from tiny samples avoid overconfident conclusions in data science and quality control.