Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Building Machine Learning Applications

In the course of this seminar we will address challenges coming up in real-world when machine learning (ML) applications are developed. Often, there are only few questions regarding the actual ML code [1], but a number of other major challenges must be addressed, such as

  • Is the ML model simple enough w.r.t. model tuning, serving costs, etc.?
  • Where can we get large labeled datasets for training and testing?
  • How to train the ML model over dirty data?
  • What is the best configuration for the selected ML model?
  • etc.

In this seminar, we aim to answer these questions by applying new research methods and approaches which provide easy ML application development, such as [2 - 6]. Students will choose a use-case and develop an ML application using approaches like automatic test data generation, auto-ML, hyperparameter optimization, ML over dirty data, etc. Applications may address use-cases such as ecommerce [7], logistics, transport [8, 9], fraud detection [10, 11, 12], finance [13], healthcare [14], online advertising [15], etc. Further use-cases and suggestions are most welcome!

Organization

  • Project seminar for master students
  • 6 credit points, 4 SWS
  • Weekly meetings: either as group meetings or individual team meetings with a supervisor
  • Supervisor: Dr.-Ing. Alexander Albrecht (assisted by Dr. Thorsten Papenbrock)
  • The first date serves as an introduction to the topic and the seminar.

In teams of two students, the students will complete the following tasks (percentages for grading):

  • (10%) Active participation during all seminar events.
  • (20%) Intermediate presentation demonstrating insights regarding your research project.
  • (00%) Regular meetings with advisor.
  • (20%) Implementation of your ML application.
  • (20%) Final presentation demonstrating your ML application.
  • (30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.

Time Table

When: Thursday, 11:00 AM - 12:30 PM
Where: Online on Google Meet (https://meet.google.com/xzp-hnbd-dii)

DateTopic
15.04.Introduction
22.04.Kubeflow. The ML toolkit for Kubernetes. Benjamin Feldmann (bakdata)
06.05.First Presentations: Use-Case & Algorithm
10.06.Discussion: Implementation Approach
22.07.Final Presentations

Literature

[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden Technical Debt in Machine Learning Systems. In NIPS. 2015.
[2] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré. Snorkel: Rapid Training Data Creation with Weak Supervision. VLDB Journal. 2019.
[3] Y. Heffetz, R. Vainshtein, G. Katz, and L. Rokach, DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering. In KDD. 2020.
[4] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama: Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019.
[5] S. Gershtein, T. Milo, G. Morami, and S. Novgorodov: Minimization of Classifier Construction Cost for Search Queries. SIGMOD. 2020.
[6] J. Picado, J. Davis, A. Termehchy, and G. Y. Lee. Learning Over Dirty Data Without Cleaning. In SIGMOD. 2020.
[7] S. Rayana and L. Akoglu. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. KDD. 2015.
[8] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, and J. Ye. The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands on Large-Scale Online Platforms. KDD. 2017.
[9] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and F. Xu, BusTr: Predicting Bus Travel Times from Real-Time Traffic. KDD. 2020.
[10] F. Victor, and A. M. Weintraud. Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges. WWW. 2021
[11] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, and C. E. Leiserson. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. KDD 2019 Workshop on Anomaly Detection in Finance.
[12] X. Li, S. Liu, Z. Li, X. Han, C. Shi, B. Hooi, H. Huang, and X. Cheng. FlowScope: Spotting Money Laundering Based on Graphs. AAAI. 2020
[13] W. Xu, W. Liu, C. Xu, J. Bian, J. Yin, and T.-Y. Liu. REST: Relational Event-driven Stock Trend Forecasting. WWW. 2021.
[14] T. Sethi, A. Mittal, S. Maheshwari, S. Chugh. Learning to Address Health Inequality in the United States with a Bayesian Decision Network. AAAI. 2019.
[15] K.-C. Lee, B. Orten, A. Dasdan, and W. Li. Estimating Conversion Rate in Display Advertising from Past Performance Data. KDD. 2012.