We are really happy to announce that our former colleague Sebastian Schmidl has successfully defended his Ph.D. dissertation with the title Scalable Time Series Anomaly Detection and Clustering.
The Abstract you can just find below.
Anomalies are interesting properties of time series because they indicate important events, such as production faults in manufacturing processes, delivery bottlenecks in predictive maintenance, earthquakes in environmental monitoring, or heart failures in healthcare. To identify such anomalous subsequences in time series data, scientists typically use a semi-automatic, interactive analysis process consisting of two steps: first, anomaly detection is the activity of identifying the location, magnitude, and length of anomalous subsequences in time series; then, anomaly clustering is applied to identify known and unknown types of anomalies and potentially common causes and semantics. Existing algorithms for anomaly detection and clustering are often difficult to apply because choosing the most effective combination of anomaly detection and clustering algorithms with suitable parameterizations and adequate specifications of the to-beexpected numbers and types of anomalies heavily relies on difficult to manage factors, such as the need for domain knowledge, time-intensive manual experimentation, costly time series (dis)similarity computations, and a pervasive lack of training data. To facilitate anomaly analysis in real-world scenarios, we require fast and interactive algorithms that can be applied even if there is no training data available. In this thesis, (i) we analyze existing time series anomaly detection algorithms for their strengths and weaknesses to identify the best algorithm for a specific dataset and task, (ii) we propose an unsupervised anomaly detection system, which parameterizes, executes, and ensembles various highly effective anomaly detection algorithms, to present an interactive scoring ranking for an arbitrary time series without a need for training data or parameter expertise, and (iii) we propose the first progressive hierarchical clustering system for variable-length time series, which creates and continuously improves an approximate dendrogram that eventually converges to the exact dendrogram, to allow fast, interactive clustering of time series anomalies.
The Database Group team wishes you all the best for your future…