Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Machine Learning for Data Streams (Wintersemester 2019/2020)

Dozent: Dr. Alexander Albrecht (Information Systems) , Dr. Thorsten Papenbrock (Information Systems)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 30.10.2019
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Deutsch
  • Maximale Teilnehmerzahl: 8

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • IT-Systems Engineering
    • HPI-ITSE-A Analyse
  • IT-Systems Engineering
    • HPI-ITSE-E Entwurf
  • IT-Systems Engineering
    • HPI-ITSE-K Konstruktion
  • IT-Systems Engineering
    • HPI-ITSE-M Maintenance
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
Data Engineering MA


In this seminar, we study novel algorithms that learn from data streams.

Traditional machine learning algorithms are rarely applicable in scenarios with streaming data. Most algorithms were designed for offline settings, i.e., the entire data set needs to be scanned and processed (multiple times), before a decision can be made.

In this seminar, students will implement, evaluate (and at best improve) machine learning algorithms for data streams from current research projects. We will look at algorithms for classification, regression, clustering, pattern mining, outlier detection, trend detection and recommender systems.

Each team, consisting of two students, chooses and presents a challenging research task and implements the proposed solution as research prototype using the streaming framework Apache Kafka with Kafka Streams.

This is a project seminar: There will be a few weekly lectures including an introductory lecture and an invited talk from industry about Stream Processing with Apache Kafka. Teams will frequently meet with the supervisor.


In teams of two students, the students will complete the following tasks (percentages for grading):

  • (10%) Active participation during all seminar events.
  • (10%) Short presentation of the selected research paper.
  • (15%) Intermediate presentation demonstrating insights regarding your research prototype.
  • (00%) Regular meetings with advisor.
  • (20%) Implementation of a research prototype with Kafka and Kafka Streams.
  • (15%) Final presentation demonstrating your solution.
  • (30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.