01.09.2021

Data Engineering Systems at VLDB - the 47th International Conference on Very Large Data Bases

The Data Engineering Systems Group attended this year's VLDB Conference in Copenhagen from August 16th to 20th. VLDB 2021 was the first fully hybrid conference in the database research community and was attended by 180 local and 800 remote researchers from academia and industry.

Wang Yue, Lawrence Benson, Tilmann Rabl, Ilin Tolovski, and Rafael Moczalla (from left to right) attended the conference in person, presented two papers and chaired two workshops.

Lawrence Benson presented the paper Viper: An Efficient Hybrid PMem-DRAM Key-Value Store by Lawrence Benson, Hendrik Makait, Tilmann Rabl.

The code is available at https://github.com/hpides/viper.

Abstract

Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem’s byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4–18x for write workloads, while matching or surpassing their get performance.

Ilin Tolovski presented the paper A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks by Nina Ihde, Paula Marten, Ahmed Eleliemy, Gabrielle Poerwawinata, Pedro Silva, Ilin Tolovski, Florina M. Ciorba, and Tilmann Rabl. The paper is the result of a research collaboration in the Horizon 2020 DAPHNE Project between the Hasso-Plattner-Institute and the High Performance Computing Group from University of Basel.

Abstract

In recent years, there has been a convergence of Big Data (BD), High Performance Computing (HPC), and Machine Learning (ML) systems. This convergence is due to the increasing complexity of long data analysis pipelines on separated software stacks. With the increasing complexity of data analytics pipelines comes a need to evaluate their systems, in order to make informed decisions about technology selection, sizing and scoping of hardware. While there are many benchmarks for each of these domains, there is no convergence of these efforts. As a first step, it is also necessary to understand how the individual benchmark domains relate. In this work, we analyze some of the most expressive and recent benchmarks of BD, HPC, and ML systems. We propose a taxonomy of those systems based on individual dimensions such as accuracy metrics and common dimensions such as workload type. Moreover, we aim at enabling the usage of our taxonomy in identifying adapted benchmarks for their BD, HPC, and ML systems. Finally, we identify challenges and research directions related to the future of converged BD, HPC, and ML system benchmarking.

Besides session chairing at the conference, Tilmann Rabl co-chaired the Ph.D. Workshop on Monday, August 16, and Ilin Tolovski, Rafael Moczalla and Tilmann Rabl were local chairs for the TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC) on Friday, August 20.