Research & Papers

Backcasting forecast errors: model collapsing to mean [P]

RandomForest fails to predict forecast errors, shrinking predictions to zero

Deep Dive

A practitioner on a forum describes a frustrating time series backcasting problem: they have daily forecasts issued between 2020 and 2026, each with horizons 1–14 days ahead, along with actual measurements and climatological normals. The goal is to backcast the forecast error (forecast minus actual) for dates before 2020. Despite engineering features like lagged anomalies, rolling statistics, and target encoding, and tuning a RandomForest regressor (max_depth 10–15, min_leaf 10, 300 trees), the model consistently shrinks predictions toward zero, the mean of the error distribution.

This behavior is classic for models that cannot find a signal in the data — they default to the average. The practitioner notes that MAE is no better than predicting zero for all horizons, and the prediction uncertainty (std, quantiles) is severely underestimated. The issue may stem from the error being essentially noise after removing seasonality and trend via normals, leaving no systematic pattern for the model to learn. Suggestions from the community include trying gradient boosting (XGBoost, LightGBM) with Poisson or Tweedie loss, using residual modeling after a simpler baseline, or incorporating domain knowledge about forecast bias.

Key Points
  • RandomForest predictions collapse to mean (zero error) across all horizons
  • MAE matches constant zero prediction even for horizon 1
  • Extensive feature engineering and hyperparameter tuning failed to improve results

Why It Matters

Common ML pitfall: when target is pure noise, complex models overfit or default to mean.