Grulich, Philipp Marian; Saitenmacher, René; Traub, Jonas; Breß, Sebastian; Rabl, Tilmann; Markl, Volker
Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018.
Machine learning techniques for data stream analysis suffer from concept drifts such as changed user preferences, varying weather conditions, or economic changes. These concept drifts cause wrong predictions and lead to incorrect business decisions. Concept drift detection methods such as adaptive windowing (Adwin) allow for adapting to concept drifts on the fly. In this paper, we examine Adwin in detail and point out its throughput bottlenecks. We then introduce several parallelization alternatives to address these bottlenecks. Our optimizations lead to a speedup of two orders of magnitude over the original Adwin implementation. Thus, we explore parallel adaptive windowing to provide scalable concept detection for high-velocity data streams with millions of tuples per second.