Develop Your Own Database

General Information

Teaching team: Daniel Lindner, Marcel Weisgut, Martin Boissier, Stefan Halfpap, Thomas Bodner, Prof. Dr. Felix Naumann, Prof. Dr. Tilmann Rabl
Weekly hours : 4
ECTS : 6 (graded)
Schedule: Thursdays, 9.15 am
- Introduction on April 20, 2023
- Lecture phase from April 20 to June 8
- Project phase from June 8 (topic assignment) to August 24
- Final presentation: August 3
Room: L-1.06
Enrolment deadline: April 1 to May 07 (please enroll and form groups until April 26, 2023)
Modules:
- IT-Systems Engineering MA: BPET T/K/S, OSIS T/K/S
- Data Engineering MA: DASY T/K/S
Teaching Form: Seminar / Exercise
Enrolment Type: Compulsory Elective Module
Course Language: German
Maximum number of participants: 18 (increased limit!)
Moodle: https://moodle.hpi.de/course/view.php?id=429 (enrollment key: namespace opossum)
Slides for first lecture are provided here.

Description

This project seminar is split into two phases. In the lecture phase, we build a database system from scratch. We develop basic concepts of modern databases step by step, with a focus on in-memory technology and accompanied by practical programming exercises. In the project phase, we concentrate on a single, more complex concept, which we will implement in our research DBMS Hyrise (open source). In both phases, you will work in teams of three students.

In the lecture phase (April 20 to June 8), fundamental database components (storage, compression, operators) are presented and eventually implemented by each group independently. Getting started is eased by given interfaces and test cases. We deepen our understanding of C++, the overall architecture, and database knowledge with code reviews and by discussing different implementation ideas.

We work on a larger code base in the project phase (starting June 8). We implement new components advised by the teaching staff, which will give you feedback during weekly meetings. Here, we focus on different advanced optimization techniques. We can profit from existing test cases (>90% test coverage) and benchmark suites. Since the complexity of our database system impedes getting a complete overview of the code base within a few weeks, we focus on isolating achievable problems and providing them as project topics. Thus, you can experience to familiarize yourself with a complex system with the help and advice of the teaching staff.

This seminar is a perfect choice if you want to

understand how modern main-memory databases work,
collect experience in the development of complex and high-performant systems,
improve your C++-20 knowledge, and
work in a smaller group as part of a large open-source software project.

This seminar also serves as a solid basis for ongoing work and research on database topics, e.g., in a master's project or thesis.

Seminar Structure

The seminar will be held in presence.

Lecture phase (April 20 to June 8):
- Sprint 1: Basic table functionality, e.g., column-based data management of different data types
- Sprint 2: Dictionary encoding
- Sprint 3: Table scan as the first database operator
- Sprint 4: Introduction to Hyrise
Project phase (starting June 8): Implementation of selected components (not yet final), such as
- maintenance of data dependencies for dynamic data,
- enabling another SQL functionality for analytic purposes, or
- development of an efficient encoding for variable-width string segments.
Final presentation (planned for August 3)

Prerequisites

Basic knowledge of (modern) C++.
A fundamental understanding of databases (DBS I) is advantageous but optional.

Teaching Form and Learning Process

Lectures for fundamental introduction to the implementation of database concepts and required C++ knowledge
Group-based software project
Weekly meetings with your advisor

Grading and Deliverables

Programming tasks
- Sprint implementations
- Code review of other group’s sprint implementations
- Project Implementation
- Code review of other group’s project implementation
Project presentation
Active participation in seminar meetings

Criteria for programming tasks are, besides functionality:

Code quality
Performance
Test coverage

Learning Goals

After attending this course, students should

Have a deep understanding of fundamental concepts of modern databases,
Be able to familiarize themselves with a complex code base and implement components in it,
Have an intuition for designing performance experiments and the ability to interpret, analyze, and discuss the results, and
Have gained experience in the application of modern C++ concepts.

Course Material

All materials will be available in a Moodle course.

Literature

Hasso Plattner. A Course in in-Memory Data Management: The Inner Mechanics of in-Memory Databases, Second Edition (2014). DOI: 10.1007/978-3-642-55270-0.

Enrollment

The seminar is designed to allow at least six (instead of three) groups, with three members each working on disjunct projects. Thus, the seminar may be limited to 18 students (increased limit!) to guarantee appropriate supervision. However, in the last semesters, all interested students could participate. If you are interested, please enroll until April 26 (Moodle). This way, you can form groups by the second meeting, and nobody has to deliver work without knowing if they can attend.