CAST method outperforms Transformers on distribution time series with 1.27 KL rank
A novel causal forecasting algorithm beats 11 baselines across 11 benchmarks
Many real-world systems—from queue occupancies and mobility shares to public health mixtures and air quality profiles—generate observations that live on the probability simplex (values sum to 1) and evolve over time. Traditional forecasting methods treat these as scalar vectors, ignoring the constraints and dynamics of the simplex. Researchers from multiple institutions introduce CAST (Causal Anchored Simplex Transport), a successor-local operator that operates directly on the simplex. It works by retrieving empirical successors from causal context, stabilizing them with a persistence anchor, and applying a bounded local stochastic transport on ordered supports—every step preserves the simplex by construction.
CAST also identifies a structural failure mode called latent transition-kernel aliasing, where similar observed distributions evolve differently under different contextual regimes. The authors prove that any forecaster relying only on an aliased summary incurs an irreducible weighted Jensen-Shannon excess-risk lower bound, while the CAST hypothesis class contains the optimal regime-aware Bayes successor. On eleven public and simulated benchmarks spanning ecology, energy, diet, mortality, employment, air quality, severe weather, mobility, and queue occupancy, CAST attains the best average rank on both one-step KL (1.27) and autoregressive rollout JSD (1.91), winning 8/11 sections on each metric against a broad set of baselines including Transformers, RNNs, and statistical methods. Component ablations and synthetic aliasing experiments confirm the theory.
- CAST uses a successor-local operator with a persistence anchor and bounded local stochastic transport, all preserving the simplex structure.
- Identifies latent transition-kernel aliasing and proves that CAST's hypothesis class contains the optimal Bayes successor while aliased models face an irreducible risk lower bound.
- Achieves best average rank of 1.27 on one-step KL and 1.91 on rollout JSD, winning 8/11 sections against Transformers, RNNs, and statistical baselines.
Why It Matters
Enables more accurate forecasts for critical systems like public health, traffic, and energy grids by respecting their natural distributional structure.