Konstruktion von Machine Learning Anwendungen (Sommersemester 2021)

Dozent: Dr. Alexander Albrecht (Information Systems)
Website zum Kurs: https://hpi.de/naumann/teaching/current-courses/ss-21/building-machine-learning-applications.html

Allgemeine Information

Semesterwochenstunden: 4
ECTS: 6
Benotet: Ja
Einschreibefrist: 18.03.2021 - 09.04.2021
Lehrform: Projektseminar
Belegungsart: Wahlpflichtmodul
Lehrsprache: Deutsch
Maximale Teilnehmerzahl: 8

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA

IT-Systems Engineering
- HPI-ITSE-A Analyse
IT-Systems Engineering
- HPI-ITSE-K Konstruktion
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung

Data Engineering MA

DATA: Data Analytics
- HPI-DATA-K Konzepte und Methoden
DATA: Data Analytics
- HPI-DATA-T Techniken und Werkzeuge
DATA: Data Analytics
- HPI-DATA-S Spezialisierung
SCAL: Scalable Data Systems
- HPI-SCAL-K Konzepte und Methode
SCAL: Scalable Data Systems
- HPI-SCAL-T echniken und Werkzeuge
SCAL: Scalable Data Systems
- HPI-SCAL-S Spezialisierung

Beschreibung

In the course of this seminar we will address challenges coming up in real-world when machine learning (ML) applications are developed. Often, there are only few questions regarding the actual ML code [1], but a number of other major challenges must be addressed, such as

Is the ML model simple enough w.r.t. model tuning, serving costs, etc.?
Where can we get large labeled datasets for training and testing?
How to train the ML model over dirty data?
What is the best configuration for the selected ML model?
etc.

In this seminar, we aim to answer these questions by applying new research methods and approaches which provide easy ML application development, such as [2 - 6]. Students will choose a use-case and develop an ML application using approaches like automatic test data generation, auto-ML, hyperparameter optimization, ML over dirty data, etc. Applications may address use-cases such as ecommerce [7], logistics, transport [8, 9], fraud detection [10, 11, 12], finance [13], healthcare [14], online advertising [15], etc. Further use-cases and suggestions are most welcome!

Literatur

[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden Technical Debt in Machine Learning Systems. In NIPS. 2015.
[2] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré. Snorkel: Rapid Training Data Creation with Weak Supervision. VLDB Journal. 2019.
[3] Y. Heffetz, R. Vainshtein, G. Katz, and L. Rokach, DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering. In KDD. 2020.
[4] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama: Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019.
[5] S. Gershtein, T. Milo, G. Morami, and S. Novgorodov: Minimization of Classifier Construction Cost for Search Queries. SIGMOD. 2020.
[6] J. Picado, J. Davis, A. Termehchy, and G. Y. Lee. Learning Over Dirty Data Without Cleaning. In SIGMOD. 2020.
[7] S. Rayana and L. Akoglu. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. KDD. 2015.
[8] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, and J. Ye. The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands on Large-Scale Online Platforms. KDD. 2017.
[9] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and F. Xu, BusTr: Predicting Bus Travel Times from Real-Time Traffic. KDD. 2020.
[10] F. Victor, and A. M. Weintraud. Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges. WWW. 2021
[11] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, and C. E. Leiserson. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. KDD 2019 Workshop on Anomaly Detection in Finance.
[12] X. Li, S. Liu, Z. Li, X. Han, C. Shi, B. Hooi, H. Huang, and X. Cheng. FlowScope: Spotting Money Laundering Based on Graphs. AAAI. 2020
[13] W. Xu, W. Liu, C. Xu, J. Bian, J. Yin, and T.-Y. Liu. REST: Relational Event-driven Stock Trend Forecasting. WWW. 2021.
[14] T. Sethi, A. Mittal, S. Maheshwari, S. Chugh. Learning to Address Health Inequality in the United States with a Bayesian Decision Network. AAAI. 2019.
[15] K.-C. Lee, B. Orten, A. Dasdan, and W. Li. Estimating Conversion Rate in Display Advertising from Past Performance Data. KDD. 2012.

Leistungserfassung

In teams of two students, the students will complete the following tasks (percentages for grading):

(10%) Active participation during all seminar events.
(20%) Intermediate presentation demonstrating insights regarding your research project.
(00%) Regular meetings with advisor.
(20%) Implementation of your ML application.
(20%) Final presentation demonstrating your ML application.
(30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.

Termine

When: Thursday, 11:00 AM - 12:30 PM
Where: Online

Please see the seminar's official web page for more details.

Zurück