Data Management in ML Systems

Instructors

Prof. Dr. Tilmann Rabl, Hannah Marienwald, Ilin Tolovski, Nils Straßenburg

Description

Distributing the machine learning pipeline has enabled researchers to create increasingly complex model architectures that can be trained using enormous data sets. However, with the increase of the model architecture complexity, the training time and the model size follow this upward trend as well. The results are usually giant models that take a long time to train, are inefficient to update, and are difficult to store and retrieve. A great deal of the distributed training pipeline goes to parameter transfers and synchronization between devices in order to coherently advance the training. When it comes to model storage, current research shows the parameters account for a great majority of a model's storage footprint (ca. 99%). In this course, we want to address several aspects of the model management life-cycle, i.e., efficient communication patterns when training, storage, and retrieval. We will develop methods that will allow us to execute efficient parameter transfers during the training process. This should minimize the network traffic and help reduce the training time. Moreover, efficiently storing model parameters will help reduce the model's storage footprint and the computational cost of model querying. Therefore, employing data management techniques would allow us to make distributed ML systems more efficient with regard to training time and storage.

Structure

Project

This seminar will be structured around working on project topics in the field of Data Management in Machine Learning Systems. The students can work in groups of 2 to develop a project idea, implement, and evaluate it. At the end of the course, the students should present their findings and hand in a written report on their topic. We offer the possibility to publish the project results at a topic-related conference.

Paper presentations

In this course, the students will have the opportunity to prepare discussion sessions on the state-of-the-art research in machine learning systems. This includes studying a research paper in detail, presenting it in front of the group, introducing valuable insights, and leading the following discussion. To be adequately prepared for this, we will beforehand discuss the best practices for reading, writing and presenting scientific papers. Ideally, the papers that will be presented in our sessions would cover the related work of the chosen project topics. Every week, each student will need to summarize one of the presented papers in a one-pager.

Grading

Project + report - 60%
Final presentation - 20%
Paper presentations - 20%

Announcements

The course will be conducted on-site at HPI. The lectures will take place on Thursdays, 13:30 - 15:00 in room F - E.06.
There is an option to follow the seminar online. The zoom link will be shared in Moodle.
Course management via Moodle. There we will make any announcements and share course materials.
HPI Moodle Course
The course is limited to 12 students.
If you have any questions, please contact me at ilin.tolovski (at) hpi.de

Schedule

Week 1: Introduction to the seminar: Data Management in ML Systems
- Course Logistics
- Introduction to Deep Learning
- Model Training in Distributed Environments
- Model Management
- Discussion of open research questions
- Present project topics
Week 2: How to read a scientific paper
Week 3: Paper presentations
Week 4: Paper presentations
Week 5: Paper presentations
Week 6: Proposal presentations
Week 7: Paper presentations
Week 8: Paper presentations
Christmas Break: 20.12.2021 - 02.01.2022
Week 9: Paper presentations
Week 10: Project consultation
Week 11: Intermediate Presentation
Week 12: How to write a scientific paper
Week 13: Project consultation
Weeks 14+15: Final Presentations (15 min presentations)
Deadline for reports: 28.02.2022

Data Management in ML Systems

Instructors

Description

Structure

Project

Paper presentations

Grading

Announcements

Schedule

Chair

News

20.11.2024 | Paper on Ecological Efficiency of Database Servers Accepted at CIDR 2025

09.08.2024 | Paper on Query Compilation for GPUs accepted at LWDA '24

18.07.2024 | Stork paper accepted at DATAI '24

08.03.2024 | CXL Buffer Management Paper Accepted at HardBD & Active '24

01.02.2024 | InferDB paper accepted at VLDB '24

Events

24.03.2022 | FG DB Symposium

Directions