Analyzing Efficient Stream Processing on Modern Hardware

Zeuch, Steffen; Breß, Sebastian; Rabl, Tilmann; Monte, Bonaventura Del; Karimov, Jeyhun; Lutz, Clemens; Renz, Manuel; Traub, Jonas; Markl, Volker in PVLDB 2019 .

ModernStream Processing Engines(SPEs) process largedata volumes under tight latency constraints. Many SPEsexecute processing pipelines using message passing on shared-nothing architectures and apply a partition-basedscale-outstrategy to handle high-velocity input streams. Further-more, many state-of-the-art SPEs rely on a Java Virtual Ma-chine to achieve platform independence and speed up systemdevelopment by abstracting from the underlying hardware.In this paper, we show that taking the underlying hard-ware into account is essential to exploit modern hardwareefficiently. To this end, we conduct an extensive experimen-tal analysis of current SPEs and SPE design alternativesoptimized for modern hardware. Our analysis highlights po-tential bottlenecks and reveals that state-of-the-art SPEs arenot capable of fully exploiting current and emerging hard-ware trends, such as multi-core processors and high-speednetworks. Based on our analysis, we describe a set of designchanges to the common architecture of SPEs toscale-uponmodern hardware. We show that the single-node throughputcan be increased by up to two orders of magnitude comparedto state-of-the-art SPEs by applying specialized code genera-tion, fusing operators, batch-style parallelization strategies,and optimized windowing. This speedup allows for deploy-ing typical streaming applications on a single or a few nodesinstead of large clusters.
Tagsmodernhardware  myown  streamprocessing  vldb