Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
  
 

Data Management in ML Systems

Instructors

Prof. Dr. Tilmann Rabl, Ilin Tolovski

Description

Distributing the machine learning pipeline has enabled researchers to create increasingly complex model architectures that can be trained using enormous data sets. However, with the increase of the model architecture complexity, the training time and the model size follow this upward trend as well. The results are usually giant models that take a long time to train, are inefficient to update, and are difficult to store and retrieve. A great deal of the distributed training pipeline goes to parameter transfers and synchronization between devices in order to coherently advance the training. When it comes to model storage, current research shows the parameters account for a great majority of a model's storage footprint (ca. 99%). In this course, we want to address several aspects of the model management life-cycle, i.e., efficient communication patterns when training, storage, and retrieval. We will develop methods that will allow us to execute efficient parameter transfers during the training process. This should minimize the network traffic and help reduce the training time. Moreover, efficiently storing model parameters will help reduce the model's storage footprint and the computational cost of model querying. Therefore, employing data management techniques would allow us to make distributed ML systems more efficient with regard to training time and storage.

Structure

Project

This seminar will be structured around working on project topics in the field of Data Management in Machine Learning Systems. The students can work in groups of 2-4 to develop a project idea, implement, and evaluate it. At the end of the course, the students should present their findings and hand in a written report on their topic. We offer the possibility to publish the project results at a topic-related conference.

Paper presentations

In this course, the students will have the opportunity to prepare discussion sessions on the state-of-the-art research in machine learning systems. This includes studying a research paper in detail, presenting it in front of the group, introducing valuable insights, and leading the following discussion. To be adequately prepared for this, we will beforehand discuss the best practices for reading, writing and presenting scientific papers. Ideally, the papers that will be presented in our sessions would cover the related work of the chosen project topics. 

Grading

  • Project + report - 60%
  • Final presentation - 20%
  • Paper presentations - 20%

Announcements

  • The course will be conducted on-site at HPI.
  • Course management via Moodle. There we will make any announcements and share course materials.
  • HPI Moodle Course
  • The course is limited to 12 students.
  • If you have any questions, please contact me at ilin.tolovski (at) hpi.de

Schedule

  • Week 1 (25.10. - 29.10.): Introduction to the seminar: Data Management in ML Systems
    • Course Logistics
    • Model Management
    • Model Training in Distributed Environments
    • Discussion of open research questions
    • Present project topics
  • Weeks 2 + 3 (01.11. - 12.11.): How to read/write a scientific paper
  • Week 4 (15.11. - 19.11.): Project topic choice (10 min presentation per group) & literature overview of the area
  • Week 5 (22.11. - 26.11.): Related work presentation (Groups 1+2) + Discussion
  • Week 6 (29.11. - 03.12.): Related work presentation (Groups 3+4) + Discussion 
  • Week 7 (06.12. - 10.12.): Related work presentation (Groups 5+6) + Discussion
  • Weeks 8 - 16: Project meetings
    • Status meetings, quick presentations
    • Discussions of the current progress from the students
  • Final semester week (14-18.02.) - Final Presentations

  • Deadline for reports: 28.02.2022