Ana Klimovic

Affiliation: ETH Zurich
Title: Resource-Efficient ML with Scalable Input Data Processing

Abstract

Training deep neural networks (DNNs) is resource-intensive, time-consuming, and expensive. Despite the compute and memory-intensive nature of DNN applications, they often underutilize expensive ML hardware accelerators, such as GPUs. This talk will explore how to improve the utilization of ML hardware accelerators by eliminating input data processing bottlenecks and efficiently sharing resources between jobs. We will discuss the characteristics of ML input data pipelines, which motivate the design of a new data preprocessing system architecture, in which we disaggregate data processing from model training. I will present Cachew, a fully-managed service for ML data processing, built on top of Tensorflow's data loading framework, tf.data. Cachew dynamically scales distributed resources for data processing to avoid input data stalls. The service also maintains a global view of data processing across jobs, which enables selectively caching preprocessed datasets to maximize training throughput and improve energy efficiency across jobs.

I will conclude by discussing further oppotunities to improve the energy efficiency of DNN training, particularly in real-world settings where data dynamics require models to be frequently retrained. I will give a preview of our early work on Modyn, an open-source platform and benchmark suite for ML training on dynamic datasets, which enables researchers to explore data selection and retraining policies.

Short CV

Ana Klimovic is an Assistant Professor in the Systems Group of the Computer Science Department at ETH Zurich. Her research interests span operating systems, computer architecture, and their intersection with machine learning. Ana's work focuses on computer system design for large-scale applications such as cloud computing services, data analytics, and machine learning. Before joining ETH in August 2020, Ana was a Research Scientist at Google Brain and completed her Ph.D. in Electrical Engineering at Stanford University.

Ana Klimovic

Abstract

Short CV

Chair

News

09.08.2024 | Paper on Query Compilation for GPUs accepted at LWDA '24

18.07.2024 | Stork paper accepted at DATAI '24

08.03.2024 | CXL Buffer Management Paper Accepted at HardBD & Active '24

01.02.2024 | InferDB paper accepted at VLDB '24

01.02.2024 | POLAR paper accepted at VLDB '24

Events

24.03.2022 | FG DB Symposium

Directions