Introducing Feature-Based Trajectory Clustering, a clustering algorithm for longitudinal data
A new two-step algorithm maps individual time-series data into Euclidean space to find common evolutionary patterns.
Researchers Marie-Pierre Sylvestre and Laurence Boulanger have published a new paper on arXiv titled 'Introducing Feature-Based Trajectory Clustering,' presenting a specialized algorithm designed for the complex task of clustering longitudinal data. This type of data, common in fields like medicine, finance, and social sciences, tracks how a specific variable changes over time for many individuals. The core challenge is that while each individual's path is unique, underlying patterns or 'characteristic features' of evolution are often shared among groups. The new algorithm provides a structured method to uncover these hidden commonalities.
The method operates in two distinct phases. First, it transforms each individual's raw time-series data into a single point within a multi-dimensional Euclidean space. This transformation is governed by specific mathematical formulas engineered to extract and quantify various characteristic features from the temporal evolution. Second, with all individuals represented as points in this feature space, the algorithm applies the established Spectral Clustering technique to this point cloud to identify distinct groups. This two-step approach separates the feature engineering from the clustering itself, offering a flexible and potentially more interpretable framework for understanding population dynamics over time compared to applying clustering directly to raw, irregular time-series data.
- Designed for longitudinal data, which tracks a variable over time for many individuals (e.g., patient health metrics).
- Uses a two-step process: feature-based mapping to Euclidean space followed by Spectral Clustering.
- Aims to find groups where individuals share underlying characteristic patterns in how their data evolves, not just static similarity.
Why It Matters
Enables better pattern discovery in temporal data across healthcare, customer behavior, and quantitative finance applications.