Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Konstruktion von Machine Learning Anwendungen (Sommersemester 2021)

Dozent: Dr. Alexander Albrecht (Information Systems)
Website zum Kurs: https://hpi.de/naumann/teaching/current-courses/ss-21/building-machine-learning-applications.html

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 18.03.2021 - 09.04.2021
  • Lehrform: Projektseminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Deutsch
  • Maximale Teilnehmerzahl: 8

Studiengänge & Module

IT-Systems Engineering MA
  • ITSE-Analyse
  • ITSE-Konstruktion
  • OSIS-Konzepte und Methoden
  • OSIS-Techniken und Werkzeuge
  • OSIS-Spezialisierung
Data Engineering MA


In the course of this seminar we will address challenges coming up in real-world when machine learning (ML) applications are developed. Often, there are only few questions regarding the actual ML code [1], but a number of other major challenges must be addressed, such as

  • Is the ML model simple enough w.r.t. model tuning, serving costs, etc.?
  • Where can we get large labeled datasets for training and testing?
  • How to train the ML model over dirty data?
  • What is the best configuration for the selected ML model?
  • etc.

In this seminar, we aim to answer these questions by applying new research methods and approaches which provide easy ML application development, such as [2 - 6]. Students will choose a use-case and develop an ML application using approaches like automatic test data generation, auto-ML, hyperparameter optimization, ML over dirty data, etc. Applications may address use-cases such as ecommerce [7], logistics, transport [8, 9], fraud detection [10, 11, 12], finance [13], healthcare [14], online advertising [15], etc. Further use-cases and suggestions are most welcome!


[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden Technical Debt in Machine Learning Systems. In NIPS. 2015.
[2] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré. Snorkel: Rapid Training Data Creation with Weak Supervision. VLDB Journal. 2019.
[3] Y. Heffetz, R. Vainshtein, G. Katz, and L. Rokach, DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering. In KDD. 2020.
[4] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama: Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019.
[5] S. Gershtein, T. Milo, G. Morami, and S. Novgorodov: Minimization of Classifier Construction Cost for Search Queries. SIGMOD. 2020.
[6] J. Picado, J. Davis, A. Termehchy, and G. Y. Lee. Learning Over Dirty Data Without Cleaning. In SIGMOD. 2020.
[7] S. Rayana and L. Akoglu. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. KDD. 2015.
[8] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, and J. Ye. The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands on Large-Scale Online Platforms. KDD. 2017.
[9] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and F. Xu, BusTr: Predicting Bus Travel Times from Real-Time Traffic. KDD. 2020.
[10] F. Victor, and A. M. Weintraud. Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges. WWW. 2021
[11] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, and C. E. Leiserson. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. KDD 2019 Workshop on Anomaly Detection in Finance.
[12] X. Li, S. Liu, Z. Li, X. Han, C. Shi, B. Hooi, H. Huang, and X. Cheng. FlowScope: Spotting Money Laundering Based on Graphs. AAAI. 2020
[13] W. Xu, W. Liu, C. Xu, J. Bian, J. Yin, and T.-Y. Liu. REST: Relational Event-driven Stock Trend Forecasting. WWW. 2021.
[14] T. Sethi, A. Mittal, S. Maheshwari, S. Chugh. Learning to Address Health Inequality in the United States with a Bayesian Decision Network. AAAI. 2019.
[15] K.-C. Lee, B. Orten, A. Dasdan, and W. Li. Estimating Conversion Rate in Display Advertising from Past Performance Data. KDD. 2012.


In teams of two students, the students will complete the following tasks (percentages for grading):

  • (10%) Active participation during all seminar events.
  • (20%) Intermediate presentation demonstrating insights regarding your research project.
  • (00%) Regular meetings with advisor.
  • (20%) Implementation of your ML application.
  • (20%) Final presentation demonstrating your ML application.
  • (30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.


When: Thursday, 11:00 AM - 12:30 PM
Where: Online

Please see the seminar's official web page for more details.