Nils Straßenburg, Ilin Tolovski, and Tilmann Rabl received the EDBT 2022 Best Paper Award for thier paper on Efficiently Managing Deep Learning Models in a Distributed Environment.
Deep learning has revolutionized many domains relevant in research and industry, including computer vision and natural language processing by significantly outperforming previous state-of-the-art approaches. This is why deep learning models are part of many essential software applications. To guarantee their reliable and consistent performance even in changing environments, they need to be regularly adjusted, improved, and retrained but also documented, deployed, and monitored. An essential part of this set of processes, referred to as model management, is to save and recover models. To enable debugging, many applications require an exact model representation.
In this paper, we investigate if, and to what extend, we can outperform a baseline approach capable of saving and recovering models, while focusing on storage consumption, time-to-save, and time-to-recover. We present our Python library MMlib, offering three approaches: a baseline approach that saves complete model snapshots, a parameter update approach that saves the updated model data, and a model provenance approach that saves the model’s provenance instead of the model itself. We evaluate all approaches in four distributed environments on different model architectures, model relations, and data sets. Our evaluation shows that both the model provenance and parameter update approach outperform the baseline by up to 15.8% and 51.7% in time-to-save and by up to 70.0% and 95.6% in storage consumption, respectively.