We are happy to announce that four SIGMOD papers and one VLDB paper got accepted to be presented at conferences in 2020.
1) Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines written by Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl, ACM SIGMOD/PODS International Conference on Management of Data, Portland, OR, USA, 2020
Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of- the-art, and reduces latency by three orders of magnitude upon a reconfiguration.
2) Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects written by Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, Volker Markl, ACM SIGMOD/PODS International Conference on Management of Data, Portland, OR, USA, 2020
GPUs have long been discussed as accelerators for database query processing because of their high processing power and memory bandwidth. However, two main challenges limit the utility of GPUs for large-scale data processing: (1) the onboard memory capacity is too small to store large data sets, yet (2) the interconnect bandwidth to CPU main-memory is insufficient for ad-hoc data transfers. As a result, GPU-based systems and algorithms run into a transfer bottleneck and do not scale to large data sets. In practice, CPUs process large-scale data faster than GPUs with current technology. In this paper, we investigate how a fast interconnect can resolve these scalability limitations using the example of NVLink 2.0. NVLink 2.0 is a new interconnect technology that links dedicated GPUs to a CPU. The high bandwidth of NVLink 2.0 enables us to overcome the transfer bottleneck and to efficiently process large data sets stored in main-memory on GPUs. We perform an in-depth analysis of NVLink 2.0 and show how we can scale a no-partitioning hash join beyond the limits of GPU memory. Our evaluation shows speedups of up to 18× over PCI-e 3.0 and up to 7.3× over an optimized CPU implementation. Fast GPU interconnects thus enable GPUs to efficiently accelerate query processing.
3) Grizzly: Efficient Stream Processing Through Adaptive Query Compilation written by Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl, ACM SIGMOD/PODS International Conference on Management of Data, Portland, OR, USA, 2020
Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They rely on managed runtimes, an interpretation-based processing model, and do not perform runtime optimizations. Recent research states that this limits the utilization of modern hardware and neglects changing data characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE to enable highly efficient query execution on modern hardware. We extend query-compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime re-optimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to dynamically adjust to changing data-characteristics at runtime. Our experiments show that Grizzly achieves up to an order of magnitude higher throughput and lower latency compared to state-of-the-art interpretation-based SPEs.
4)Optimizing Machine Learning Workloads in Collaborative Environments, Behrouz Derakhshan written by Alireza Rezaei Mahdiraji, Ziawasch Abedjan,Tilmann Rabl, and Volker Markl, ACM SIGMOD/PODS International Conference on Management of Data, Portland, OR, USA, 2020
Effective collaboration among data scientists results in high-quality and efficient machine learning (ML) workloads. In a collaborative environment, such as Kaggle or Google Colabratory, users typically re-execute or modify published scripts to recreate or improve the result. This introduces many redundant data processing and model training operations. Reusing the data generated by the redundant operations leads to the more efficient execution of future workloads. However, existing collaborative environments lack a data management component for storing and reusing the result of previously executed operations.
In this paper, we present a system to optimize the execution of ML workloads in collaborative environments by reusing previously performed operations and their results. We utilize a so-called Experiment Graph (EG) to store the artifacts, i.e., raw and intermediate data or ML models, as vertices and operations of ML workloads as edges. In theory, the size of EG can become unnecessarily large, while the storage budget might be limited. At the same time, for some artifacts, the overall storage and retrieval cost might outweigh the recomputation cost. To address this issue, we propose two algorithms for materializing artifacts based on their likelihood of future reuse. Given the materialized artifacts inside EG, we devise a linear-time reuse algorithm to find the optimal execution plan for incoming ML workloads. Our reuse algorithm only incurs a negligible overhead and scales for the high number of incoming ML workloads in collaborative environments. Our experiments show that we improve the run-time by one order of magnitude for repeated execution of the workloads and 50% for the execution of modified workloads in collaborative environments.
1) Quantifying TPCH Choke Points and Their Optimizations [Experiments and Analyses] written by Markus Dreseler, Martin Boissier, Tilmann Rabl, Matthias Uflacker, 46th International Conference on Very
Large Data Bases, Tokyo, Japan, 2020 TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems. This paper provides a systematic analysis of choke points and their optimizations. It complement previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Four queries (Q2, Q4, Q17, and Q21) are strongly influenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.