AI Safety

METR's data can't distinguish between trajectories (and 80% horizons are an order of magnitude off)

A reanalysis claims METR's headline AI capability forecasts are massively overstated.

Deep Dive

A Bayesian reanalysis of METR's task data reveals it cannot distinguish between exponential and superexponential AI growth. Four different trajectory models fit past data equally well but produce wildly different forecasts, with 80% success horizons potentially overstated by an order of magnitude. For example, 95% credible intervals for a key milestone range from 2028 to 2033 depending on the model chosen, highlighting massive uncertainty in current predictions.

Why It Matters

This calls into question the reliability of high-profile AI timelines that inform policy and investment, suggesting forecasts are far less certain than presented.