Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Ana Klimovic

Affiliation: ETH Zurich
Title: Resource-Efficient ML with Scalable Input Data Processing

 

Abstract

Training deep neural networks (DNNs) is resource-intensive, time-consuming, and expensive. Despite the compute and memory-intensive nature of DNN applications, they often underutilize expensive ML hardware accelerators, such as GPUs.  This talk will explore how to improve the utilization of ML hardware accelerators by eliminating input data processing bottlenecks and efficiently sharing resources between jobs.  We will discuss the characteristics of ML input data pipelines, which motivate the design of a new data preprocessing system architecture, in which we disaggregate data processing from model training. I will present Cachew, a fully-managed service for ML data processing, built on top of Tensorflow's data loading framework, tf.data. Cachew dynamically scales distributed resources for data processing to avoid input data stalls. The service also maintains a global view of data processing across jobs, which enables selectively caching preprocessed datasets to maximize training throughput and improve energy efficiency across jobs. 
 
I will conclude by discussing further oppotunities to improve the energy efficiency of DNN training, particularly in real-world settings where data dynamics require models to be frequently retrained. I will give a preview of our early work on Modyn, an open-source platform and benchmark suite for ML training on dynamic datasets, which enables researchers to explore data selection and retraining policies. 

Short CV

Ana Klimovic is an Assistant Professor in the Systems Group of the Computer Science Department at ETH Zurich. Her research interests span operating systems, computer architecture, and their intersection with machine learning. Ana's work focuses on computer system design for large-scale applications such as cloud computing services, data analytics, and machine learning. Before joining ETH in August 2020, Ana was a Research Scientist at Google Brain and completed her Ph.D. in Electrical Engineering at Stanford University.