Polynomial histograms cut memory costs for large-scale system metrics
New technique uses polynomial annotations to store more information per bin.
Distributed systems collect vast amounts of performance metrics—like latency, throughput, or error rates—across many machines, users, and operations. Storing these empirical distributions compactly without losing critical information is a major challenge, especially for long-tailed distributions common in system telemetry. In this 2014 preprint now on arXiv, Stokely, Hesterberg, Merchant, and Coehlo from Google (implied by the production system context) propose polynomial histograms: each histogram bin is annotated with statistical moments (mean, variance, skewness, etc.) of the data points falling into that bin. This annotation preserves more information than simply increasing the number of bins for the same storage budget.
The authors define an information loss metric for binned data and use it to compare polynomial histograms against traditional histograms. They demonstrate that for a fixed storage cost, polynomial histograms yield lower information loss, particularly for long-tailed distributions. The technique is validated on real file system metrics from a large production system. The paper provides an analytical characterization of when polynomial histograms are optimal, offering a practical tool for engineers who need to aggregate and query distribution data across multiple dimensions with limited memory.
- Polynomial histograms store statistical moments (e.g., mean, variance) inside each bin, not just counts.
- A novel information loss metric optimizes bin allocation for any histogram representation.
- Real-world testing on large-production file system metrics shows polynomial histograms beat traditional ones at equal storage cost.
Why It Matters
Better memory efficiency means more accurate monitoring of distributed systems without scaling storage costs.