Konstruktion von Machine Learning Anwendungen (Sommersemester 2021)
Dozent:
Dr. Alexander Albrecht
(Information Systems)
Website zum Kurs:
https://hpi.de/naumann/teaching/current-courses/ss-21/building-machine-learning-applications.html
Allgemeine Information
- Semesterwochenstunden: 4
- ECTS: 6
- Benotet:
Ja
- Einschreibefrist: 18.03.2021 - 09.04.2021
- Lehrform: Projektseminar
- Belegungsart: Wahlpflichtmodul
- Lehrsprache: Deutsch
- Maximale Teilnehmerzahl: 8
Studiengänge, Modulgruppen & Module
- IT-Systems Engineering
- IT-Systems Engineering
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
- DATA: Data Analytics
- HPI-DATA-K Konzepte und Methoden
- DATA: Data Analytics
- HPI-DATA-T Techniken und Werkzeuge
- DATA: Data Analytics
- HPI-DATA-S Spezialisierung
- SCAL: Scalable Data Systems
- HPI-SCAL-K Konzepte und Methode
- SCAL: Scalable Data Systems
- HPI-SCAL-T echniken und Werkzeuge
- SCAL: Scalable Data Systems
- HPI-SCAL-S Spezialisierung
Beschreibung
In the course of this seminar we will address challenges coming up in real-world when machine learning (ML) applications are developed. Often, there are only few questions regarding the actual ML code [1], but a number of other major challenges must be addressed, such as
- Is the ML model simple enough w.r.t. model tuning, serving costs, etc.?
- Where can we get large labeled datasets for training and testing?
- How to train the ML model over dirty data?
- What is the best configuration for the selected ML model?
- etc.
In this seminar, we aim to answer these questions by applying new research methods and approaches which provide easy ML application development, such as [2 - 6]. Students will choose a use-case and develop an ML application using approaches like automatic test data generation, auto-ML, hyperparameter optimization, ML over dirty data, etc. Applications may address use-cases such as ecommerce [7], logistics, transport [8, 9], fraud detection [10, 11, 12], finance [13], healthcare [14], online advertising [15], etc. Further use-cases and suggestions are most welcome!
Literatur
[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden Technical Debt in Machine Learning Systems. In NIPS. 2015.
[2] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré. Snorkel: Rapid Training Data Creation with Weak Supervision. VLDB Journal. 2019.
[3] Y. Heffetz, R. Vainshtein, G. Katz, and L. Rokach, DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering. In KDD. 2020.
[4] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama: Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019.
[5] S. Gershtein, T. Milo, G. Morami, and S. Novgorodov: Minimization of Classifier Construction Cost for Search Queries. SIGMOD. 2020.
[6] J. Picado, J. Davis, A. Termehchy, and G. Y. Lee. Learning Over Dirty Data Without Cleaning. In SIGMOD. 2020.
[7] S. Rayana and L. Akoglu. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. KDD. 2015.
[8] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, and J. Ye. The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands on Large-Scale Online Platforms. KDD. 2017.
[9] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and F. Xu, BusTr: Predicting Bus Travel Times from Real-Time Traffic. KDD. 2020.
[10] F. Victor, and A. M. Weintraud. Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges. WWW. 2021
[11] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, and C. E. Leiserson. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. KDD 2019 Workshop on Anomaly Detection in Finance.
[12] X. Li, S. Liu, Z. Li, X. Han, C. Shi, B. Hooi, H. Huang, and X. Cheng. FlowScope: Spotting Money Laundering Based on Graphs. AAAI. 2020
[13] W. Xu, W. Liu, C. Xu, J. Bian, J. Yin, and T.-Y. Liu. REST: Relational Event-driven Stock Trend Forecasting. WWW. 2021.
[14] T. Sethi, A. Mittal, S. Maheshwari, S. Chugh. Learning to Address Health Inequality in the United States with a Bayesian Decision Network. AAAI. 2019.
[15] K.-C. Lee, B. Orten, A. Dasdan, and W. Li. Estimating Conversion Rate in Display Advertising from Past Performance Data. KDD. 2012.
Leistungserfassung
In teams of two students, the students will complete the following tasks (percentages for grading):
- (10%) Active participation during all seminar events.
- (20%) Intermediate presentation demonstrating insights regarding your research project.
- (00%) Regular meetings with advisor.
- (20%) Implementation of your ML application.
- (20%) Final presentation demonstrating your ML application.
- (30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.
Termine
When: Thursday, 11:00 AM - 12:30 PM
Where: Online
Please see the seminar's official web page for more details.
Zurück