Research & Papers

Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series

Context-aware alignment loss that beats Soft-DTW on clustering and classification tasks.

Deep Dive

Elastic distances like dynamic time warping (DTW) are key for comparing time series with local misalignment, but Soft-DTW—a differentiable DTW variant—fails to handle context-aware transition costs found in distances like Move-Split-Merge (MSM). MSM uses piecewise split/merge penalties that depend on local alignment context and often outperforms DTW in classification and clustering. To bridge this gap, Christopher Holder and Anthony Bagnall propose Soft-MSM, a smooth relaxation that replaces MSM's hard split/merge costs with a gated surrogate, enabling end-to-end gradient flow through both the recursion and the transition logic. They derive forward/backward recursions, a soft alignment matrix, and a closed-form gradient, plus a divergence-corrected formulation for stability.

In experiments across 112 UCR time series datasets, Soft-MSM achieves lower MSM barycentre loss than existing MSM barycenter methods and yields significantly better clustering and nearest-centroid classification accuracy compared to Soft-DTW-based alternatives. The method integrates directly into the open-source aeon toolkit, making it accessible for practitioners. This work unlocks gradient-based optimization for a broader class of elastic distances, promising improvements in time series tasks like anomaly detection, forecasting, and shape analysis where local context matters.

Key Points
  • Soft-MSM extends differentiable alignment to context-aware MSM distances using smooth gated surrogates for split/merge penalties.
  • On 112 UCR datasets, it achieves lower MSM barycentre loss and significantly better clustering and classification than Soft-DTW.
  • Implementation is available in the open-source aeon toolkit for easy integration into machine learning pipelines.

Why It Matters

Enables gradient-based training with context-aware elastic distances, boosting time series classification and clustering performance.