Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Open Source Data Processing

Instructors

Prof. Dr. Tilmann Rabl, Lawrence Benson, Dr. Arvid Heise (Ververica), Fabian Paul (Ververica)

Description

The digital revolution leads to ever increasing amounts of data and a massively increased pace of data generation. In many use cases, archival of the data and later processing is either impossible or uneconomic due to the speed and amount of the data and the quick loss in value of data analysis over time. This has led to the development of stream processing engines (SPE), which can analysis large amounts of data in motion. This leads to two major challenges, the handling of time and potentially endless streams. In this course, we will focus on the SPE Apache Flink and develop code to support its ecosystem.

Ververica

Ververica was founded in 2014 by the original creators of Apache Flink®. The founding team consisted of a group of PostDocs and PhD students from TU Berlin, who worked on Apache Flink and its precursor Stratosphere (in which the HPI was also involved).

Structure

This course will be structured around group software projects. In these group projects, the students will experience the workflow and lifecycle of new features in the large open source ecosystem around Apache Flink. The teams will consist of 2-3 students and be actively supervised by one of the instructors. The goal of the project is to contribute the final result of each team back to the community and for the students to become Apache contributors. For project results to be contributed back, they should be efficient, well tested, and documented. Ververica will present on the ecosystem around Flink, Apache software projects in general. and the Apache way of maintaining and contributing to open source.

Basic requirements

Grading

The final grade will be made up of 60% project + documentation, 25% final presentation, 10% intermediate presentation, 5% active participation in all sessions. 

Announcements

  • The course will be managed via HPI Moodle. This is where we will announce things and share materials. 
  • The course is limited to a maximum of 9 students.
  • Kick-off event on Wednesday, 4th November 11:00 in Zoom.
  • Due to space limitations and late Corona-related registration, pleas send us an email (lawrence.benson@hpi.de, tilmann.rabl@hpi.de) if you want to participate in this course by Friday, 6th November 23:59. You do not need to send us this mail before the first event, so you can come and decide if you want to take this course after the intro. There is no first come, first served.

Schedule

The course will take place Wednesdsys at 11:00. Due to the current Corona development, the course will kick-off virtually. We will evaluate if in-person meetings  are possible once the situation improves. The meetings will be between the students and the team supervisors to discuss the individual teams' progress. We will also have slots for a kick-off session, some presentations from Ververcia, as well as intermediate and final presentations.

Links