Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Hardware-Conscious Data Processing

Instructors

Prof. Dr. Tilmann Rabl, Martin Boissier, Florian Schmeller

Description

Hardware development continuously advances, with different technologies improving at different paces. While the number of transistors in a CPU package grows, the single-core performance stagnates due to physical limitations. These trends require changes in data processing to keep database management systems efficient. In this lecture, we will take a look at current computer architectures and accelerator technologies and how they can be used for efficient data processing. We will cover CPU and memory architecture, the storage hierarchy, modern memory and storage technologies, such as NVMe, fast interconnects, such as Infiniband, NVLink, and CXL, and accelerators, such as GPUs and FPGAs. The course has a significant practical part, where the students learn to implement data structures and algorithms tailored to hardware-conscious data processing.

Literature

  • Structured Computer Organization, Andrew S. Tanenbaum, Todd Austin, 2012, 978-0132916523
  • A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases, Hasso Plattner, 2014, 978-3642552694

Announcements

  • Course management will be done using the HPI Moodle
  • The lectures will be held on-site at HPI (L-E.03)
  • Non-HPI participants: please send us an email to get access to the Moodle
  • All lectures are recorded and available on TeleTask

Topics

The lectures will be held on Tuesdays (L.E-03) and Wednesdays (L.E-03) at 11:00 o'clock.
Preliminary schedule:

DateTuesdayDateWednesday
14.4.No Lecture15.4.No Lecture
21.4.Introduction22.4.CPU Basics & Exam 0 (not graded)
28.4.CPU Basics29.4.CPU Instructions
5.5.SIMD I6.5.Task 1 (SIMD) & Exam 1
12.5.SIMD II13.5.Profiling Session
19.5.Prefetching20.5.Data Structures
26.5.Multicore27.5.Task 2 (ART) & Exam 2
2.6.Multicore II3.6.Locking
9.6.NUMA10.6.Execution Models
16.6Storage17.6.Task 3 (Query Processing) & Exam 3
18.6.Invited Talk by Prof. Gustavo Alonso (L-1.02, 10 AM)
23.6.Compute Express Link (F-E.06)24.6.Networking (L-1.02)
30.6.RDMA1.7.FPGA I & Exam 4
7.7.FPGA I8.7.GPU I
14.7.GPU II15.7.Task 3 Recap & Exam 5
21.7.Data Center Tour22.7.Q&A & Summary

Grading

The programming tasks determine 75% of the grade, there is no final exam. In addition to the graded tasks, each student will present their solution for one task in a short individual meeting with the teaching team. We will randomly select students for each current task throughout the semester. Passing this discussion is mandatory to complete the course. The programming tasks will be 25% each. In addition to the programming tasks, there will be 5 multiple-choice exams during the semester. Each of the exams will determine 5% of the final grade for a total of 25% of the final grade.

Prerequisites

This course is aimed towards students with knowledge in database and/or big data systems. Ideally, students have attended at least one of Big Data Systems, Distributed Data Management, Database Systems II, or similar. The programming tasks are all in C++, so students should be comfortable with it. We provide a small example task (see Example Coding Task in Moodle) that students can do before the course to see whether they are comfortable with C++. If you are not able to solve this task, you will probably have a very hard time in the course, as this is the very minimum level needed to complete the other tasks.