Building Machine Learning Applications

In the course of this seminar we will address challenges coming up in real-world when machine learning (ML) applications are developed. Often, there are only few questions regarding the actual ML code [1], but a number of other major challenges must be addressed, such as

Is the ML model simple enough w.r.t. model tuning, serving costs, etc.?
Where can we get large labeled datasets for training and testing?
How to train the ML model over dirty data?
What is the best configuration for the selected ML model?
etc.

In this seminar, we aim to answer these questions by applying new research methods and approaches which provide easy ML application development, such as [2 - 6]. Students will choose a use-case and develop an ML application using approaches like automatic test data generation, auto-ML, hyperparameter optimization, ML over dirty data, etc. Applications may address use-cases such as ecommerce [7], logistics, transport [8, 9], fraud detection [10, 11, 12], finance [13], healthcare [14], online advertising [15], etc. Further use-cases and suggestions are most welcome!

Organization

Project seminar for master students
6 credit points, 4 SWS
Weekly meetings: either as group meetings or individual team meetings with a supervisor
Supervisor: Dr.-Ing. Alexander Albrecht (assisted by Dr. Thorsten Papenbrock)
The first date serves as an introduction to the topic and the seminar.

In teams of two students, the students will complete the following tasks (percentages for grading):

(10%) Active participation during all seminar events.
(20%) Intermediate presentation demonstrating insights regarding your research project.
(00%) Regular meetings with advisor.
(20%) Implementation of your ML application.
(20%) Final presentation demonstrating your ML application.
(30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.

Time Table

When: Thursday, 11:00 AM - 12:30 PM
Where: Online on Google Meet (https://meet.google.com/xzp-hnbd-dii)

Date	Topic
15.04.	Introduction
22.04.	Kubeflow. The ML toolkit for Kubernetes. Benjamin Feldmann (bakdata)
06.05.	First Presentations: Use-Case & Algorithm
10.06.	Discussion: Implementation Approach
22.07.	Final Presentations

Literature

[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden Technical Debt in Machine Learning Systems. In NIPS. 2015.
[2] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré. Snorkel: Rapid Training Data Creation with Weak Supervision. VLDB Journal. 2019.
[3] Y. Heffetz, R. Vainshtein, G. Katz, and L. Rokach, DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering. In KDD. 2020.
[4] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama: Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019.
[5] S. Gershtein, T. Milo, G. Morami, and S. Novgorodov: Minimization of Classifier Construction Cost for Search Queries. SIGMOD. 2020.
[6] J. Picado, J. Davis, A. Termehchy, and G. Y. Lee. Learning Over Dirty Data Without Cleaning. In SIGMOD. 2020.
[7] S. Rayana and L. Akoglu. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. KDD. 2015.
[8] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, and J. Ye. The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands on Large-Scale Online Platforms. KDD. 2017.
[9] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and F. Xu, BusTr: Predicting Bus Travel Times from Real-Time Traffic. KDD. 2020.
[10] F. Victor, and A. M. Weintraud. Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges. WWW. 2021
[11] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, and C. E. Leiserson. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. KDD 2019 Workshop on Anomaly Detection in Finance.
[12] X. Li, S. Liu, Z. Li, X. Han, C. Shi, B. Hooi, H. Huang, and X. Cheng. FlowScope: Spotting Money Laundering Based on Graphs. AAAI. 2020
[13] W. Xu, W. Liu, C. Xu, J. Bian, J. Yin, and T.-Y. Liu. REST: Relational Event-driven Stock Trend Forecasting. WWW. 2021.
[14] T. Sethi, A. Mittal, S. Maheshwari, S. Chugh. Learning to Address Health Inequality in the United States with a Bayesian Decision Network. AAAI. 2019.
[15] K.-C. Lee, B. Orten, A. Dasdan, and W. Li. Estimating Conversion Rate in Display Advertising from Past Performance Data. KDD. 2012.