Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Description

In this seminar, we study novel algorithms that learn from data streams.

Traditional machine learning algorithms are rarely applicable in scenarios with streaming data. Most algorithms were designed for offline settings, i.e., the entire data set needs to be scanned and processed (multiple times), before a decision can be made.

In this seminar, students will implement, evaluate (and at best improve) machine learning algorithms for data streams from current research projects. We will look at algorithms for classification, regression, clustering, pattern mining, outlier detection, trend detection and recommender systems.

Each team, consisting of two students, chooses and presents a challenging research task and implements the proposed solution as research prototype using the streaming framework Apache Kafka with Kafka Streams.

This is a project seminar: There will be a few weekly lectures including an introductory lecture and an invited talk from industry about Stream Processing with Apache Kafka. Teams will frequently meet with the supervisor.

Organization

  • Project seminar for master students
  • 6 credit points, 4 SWS
  • Weekly meetings: either as group meetings or individual team meetings with a supervisor
  • Supervisor: ​Dr. Alexander Albrecht (assisted by Dr. Thorsten Papenbrock)
  • The first date serves as an introduction to the topic and the seminar. Subsequently, you can register for the course through an informal email to Alexander Albrecht.

In teams of two students, the students will complete the following tasks (percentages for grading):

  • (10%) Active participation during all seminar events.
  • (10%) Short presentation of the selected research paper.
  • (15%) Intermediate presentation demonstrating insights regarding your research prototype.
  • (00%) Regular meetings with advisor.
  • (20%) Implementation of a research prototype with Kafka and Kafka Streams.
  • (15%) Final presentation demonstrating your solution.
  • (30%) Code & documentation (on GitHub). The documentation should contain information on how to execute and evaluate your solution. Furthermore, it should also show strengths and weaknesses of the implementation.

Time Table

When: Tuesday, 9:15 - 10:45 AM
Where: F-2.10, Building F, 2nd Floor, Campus II

Date Topic
15.10. Introduction - open for everybody interested
22.10. Kick-off: Paper Selection & Team Building
19.11. First Presentations: Paper & Algorithm
17.12. Intermediate Presentations: Implementation Approach
04.02. Final Presentations