Lossy Compression of Time Series Data

Bachelor Project - Winter 2018/19 and Summer 2019

People:Prof. Dr. Tobias Friedrich, Dr. Timo Kötzing, Dr. Manuel Rizzo, Martin Schirneck
Link:Bachelor Projects IT-Systems Engineering

^{Photo credit: HPI/K. Herschelmann}

The bachelor project is a cornerstone for the praxis-related study at the Hasso Plattner Institute. Central to this project is a group of students cooperating in developing a solution for a software-project given by an industrial partner. For the bachelor project 2018/19, we work with the Industrial Analytics IA GmbH.

Problem and Motivation

Time series data is data derived from consecutive measurements over time. It appears in all domains of applied science and engineering. Algorithms for processing such data face the additional challenge of exploiting the linear structure of the data rather than treating the time component as a mere additional data value.

Storing time series data can quickly overload any available storage. The random noise introduced by physical measurements fundamentally limits the achievable compression rate. For an effective compression of high-frequency data, we are therefore forced to apply lossy compression schemes.

We want to evaluate existing algorithms and develop new algorithms for lossy compression schemes for high-frequency time series data from different domains. The key question is what compression can be achieved without losing too much information while still being able to analyze the data and using it for modeling and statistical predictions. The scientific method will be two-fold:

an empirical study of different methods on real-world data;
a thorough analysis of the theoretical limitations;

Our project partner Industrial Analytics IA GmbH is a young IoT company with a scientific background in physics and mechanical engineering. They supply the team with high-dimensional industrial data sets spanning a period of several years. The data has been obtained from physical measurements of large industrial machines. For understanding the time series data, the partner company provides details of the particular industrial machines and the underlying thermodynamics. During the project, they will support the team with the required background in mechanical engineering and industrial data analytics.

Results

The work of this bachelor project was presented at the Bachelorpodium 2019. Some of the results have been published at the 27th International Conference on Neural Information Processing (ICONIP 2020).

Friedrich, Tobias; Krejca, Martin S.; Lagodzinski, J. A. Gregor; Rizzo, Manuel; Zahn, Arthur Memetic Genetic Algorithms for Time Series Compression by Piecewise Linear ApproximationInternational Conference on Neural Information Processing (ICONIP) 2020: 592–604

[ Abstract ] [ BibTeX ] [ DOI ] [ Download ]

@inproceedings{friedrich2020memetic,
  abstract = {Time series are sequences of data indexed by time. Such data are collected in various domains, often in massive amounts, such that storing them proves challenging. Thus, time series are commonly stored in a compressed format. An important compression approach is piecewise linear approximation (PLA), which only keeps a small set of time points and interpolates the remainder linearly. Picking a subset of time points such that the PLA minimizes the mean squared error to the original time series is a challenging task, naturally lending itself to heuristics. We propose the piecewise linear approximation genetic algorithm (PLA-GA) for compressing time series by PLA. The PLA-GA is a memetic \((\mu + \lambda)\) GA that makes use of two distinct operators tailored to time series compression. First, we add special individuals to the initial population that are derived using established PLA heuristics. Second, we propose a novel local search operator that greedily improves a compressed time series. We compare the PLA-GA empirically with existing evolutionary approaches and with a deterministic PLA algorithm, known as Bellman's algorithm, that is optimal for the restricted setting of sampling. In both cases, the PLA-GA approximates the original time series better and quicker. Further, it drastically outperforms Bellman's algorithm with increasing instance size with respect to run time until finding a solution of equal or better quality &ndash; we observe speed-up factors between 7 and 100 for instances of 90,000 to 100,000 data points.},
  author = {Friedrich, Tobias and Krejca, Martin S. and Lagodzinski, J. A. Gregor and Rizzo, Manuel and Zahn, Arthur},
  booktitle = {International Conference on Neural Information Processing (ICONIP)},
  keywords = {arthurzahn gregorlagodzinski iconip manuelrizzo martinskrejca tobiasfriedrich year2020},
  pages = {592-604},
  title = {Memetic Genetic Algorithms for Time Series Compression by Piecewise Linear Approximation},
  volume = 12534,
  year = 2020
}

Hasso Plattner Institute

E-Mail: Arthur.Zahn(at)student.hpi.de

Lossy Compression of Time Series Data

Bachelor Project - Winter 2018/19 and Summer 2019

Problem and Motivation

Results

Project Team

Prof. Dr. Tobias Friedrich

Dr. Timo Kötzing

Dr. Manuel Rizzo

Martin Schirneck

Jorin Heide

Linus Heinzl

Nicolas Klodt

Felix Mujkanovic

Lars Seifert

Arthur Zahn

Algorithm Engineering

News

13.09.2024 | Best Paper awarded at IPEC

11.09.2024 | PhD awarded to Philipp Fischbeck

05.09.2024 | Two Papers accepted at RANDOM and ESA

01.08.2024 | PhD awarded to Gregor Lagodzinski

17.07.2024 | Two Papers accepted at FOCS and PPSN

31.05.2024 | Two Papers accepted at SAND and IWOCA

Contact

Apply