Hasso-Plattner-Institut
Prof. Tilmann Rabl
  
 

Lecture on Big Data Systems

Instructors

Prof. Dr. Tilmann Rabl

Description

The amount of data that can be generated and stored in academic and industrial projects and applications is increasing rapidly. Big data analytics technologies have established themselves as a solution for big data challenges to the scalability problems of traditional database systems. The vast amounts of new data that is collected, however, usually is not as easily analyzed as curated, structured data in a data warehouse is. Typically, these data are noisy, of varying format and velocity, and need to be analyzed with techniques from statistics and machine learning rather than pure SQL-like aggregations and drill-downs. Moreover, the results of the analyses frequently are models that are used for decision making and prediction. The complete process of big data analysis is described as a pipeline, which includes data recording, cleaning, integration, modeling, and interpretation.

In this lecture, we will discuss big data systems, i.e., infrastructures that are used to handle all steps in typical big data processing pipelines.

Announcements

  • Course management will be done using the Algorithm Engineering Moodle
  • Non-HPI participants: please send us an email to get access to the Moodle

Schedule (tentative)

The lecture will take place Tuesdays (HS 3) and Thursdays (HS 2) at 11:00 AM at Campus I.

Date Topic
TU 15.10. Introduction
TH 17.10. cancelled - Retreat Research School
TU 22.10. DBS Recap
TH 24.10. DBS Recap II
TU 29.10. cancelled - 20 Years HPI Celebration
TH 31.10. Reformation Day
TU 05.11. Big Data Stack
TH 07.11. Solution Quiz I
TU 12.11. Benchmarking & Measurement
TH 14.11. Cloud/Container
TU 19.11. Modern Hardware
TH 21.11. - moved to WED 20.11., 01:30 PM File Systems
TU 26.11. Map/Reduce
TH 28.11. Solution Quiz II
TU 03.12. KV-Stores
TH 05.12. Consistency
TU 10.12. Stream Processing
TH 12.12. Windows
TU 17.12. Tables and State
TH 19.12. Solution Quiz III
TU 07.01. Stream Optimizations
TH 09.01. Solution Quiz IV
TU 14.01. ML Systems
TH 16.01. ML Exec Strategies
TU 21.01. ML Lifecycle
TH 23.01. Graph Processing
TU 28.01. Graph Processing II
TH 30.01. Solution Quiz V
TU 04.02. Q&A
TH 06.02. Final Exam

Grading

The grade will determined in exercises and an exam. The time and location of the exam will be anounced at least 6 weeks in advance. The prerequisite for admission to the exam is the successful completion of the exercises. In case of low participation, the exam might be replaced by an oral examination.

The grade breakdown is as follows:

5 Exercise sheets (20% of total points)

  • 1 self assessment (unmarked)
  • 4 graded exercises (5 points each)

Programming Exercises (15% of total points)

  • November (7%)
  • January (8%)

Exam (65% of total points)