• de
Prof. Dr. Emmanuel Müller


Time series collections are abundant in many disciplines of science, especially those dealing with Earth observation data like vegetation levels, precipitation or surface temperature measurements. Correlation analysis for all pairs of time series is often the first step of exploratory data analysis since it can reveal causal relationships hidden in the data. When exploring large collections of time series, pairwise correlation computations are a significant challenge due to the inherent quadratic complexity of the problem. In many real-world time series collections the time series naturally group into clusters with highly similar behavior. We exploit this natural structure to rapidly approximate the full pairwise correlation matrix for a time series collection without computing all pairwise correlations. The main idea of our estimation algorithm COREQ (CORrelation EQuivalence) is to compute equivalence classes of highly correlated time series and pool redundant correlation estimates into a single class estimate.

Erik Scharwächter, Fabian Geier, Lukas Faber, Emmanuel Müller. Low Redundancy Estimation of Correlation Matrices for Time Series using Triangular Bounds. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018.

Corresponding author: Erik Scharwächter

Equivalence classes

The plot below shows equivalence classes obtained by COREQ on the StarLightCurves training dataset (UCR). Colors encode the class memberships. The presence of large and tight equivalence classes indicates a high redundancy in the full correlation matrix.

Low redundancy estimates

The upper row shows the full (uncompressed) estimates for the pairwise correlation matrices from the UCR Time Series Classification Archive. Low redundancy estimates obtained by COREQ are shown below. Matrices are ordered by increasing average loss (row major, top left with lowest loss to bottom right with highest loss).