04.03.2022

Two papers accepted at EDBT 2022

We are happy to announce that two papers got acceoted at EDBT 2022! The papers will be presented as part of the virtual EDBT conference on 29th March - 1st April, 2022.

1) Efficiently Managing Deep Learning Models in a Distributed Environment by Nils Straßenburg, Ilin Tolovski, and Tilmann Rabl

Abstract:

Deep learning has revolutionized many domains relevant in research and industry, including computer vision and natural language processing by significantly outperforming previous state-of-the-art approaches. This is why deep learning models are part of many essential software applications. To guarantee their reliable and consistent performance even in changing environments, they need to be regularly adjusted, improved, and retrained but also documented, deployed, and monitored. An essential part of this set of processes, referred to as model management, is to save and recover models. To enable debugging, many applications require an exact model representation.
In this paper, we investigate if, and to what extend, we can outperform a baseline approach capable of saving and recovering models, while focusing on storage consumption, time-to-save, and time-to-recover. We present our Python library MMlib, offering three approaches: a baseline approach that saves complete model snapshots, a parameter update approach that saves the updated model data, and a model provenance approach that saves the model’s provenance instead of the model itself. We evaluate all approaches in four distributed environments on different model architectures, model relations, and data sets. Our evaluation shows that both the model provenance and parameter update approach outperform the baseline by up to 15.8% and 51.7% in time-to-save and by up to 70.0% and 95.6% in storage consumption, respectively.

2) Evaluating In-Memory Hash Joins on Persistent Memory by Tobias Maltenberger, Till Lehmann, Lawrence Benson, Tilmann Rabl

Abstract:

Steady advances in processor and memory technologies have driven continuous tuning and redesigning of in-memory hash joins for decades. Over the years, research has shown advantages of both hardware-conscious radix joins and hardware-oblivious hash joins for different workloads. In this paper, we evaluate both join types on persistent memory (PMem) as an emerging memory technology offering close-to-DRAM speed at signicantly higher capacities. We study the no partitioning join (NPO) and the parallel radix join (PRO) in PMem and analyze how their performance diers from DRAM-based execution. Our results show that the PRO is always at least as fast as the NPO in DRAM. However, in PMem, the NPO outperforms the PRO by up to 1.7. Based on our findings, we provide an outlook into crucial design choices for PMem-optimized hash join implementations.