Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Develop Your Own Database

General Information

Description

This project seminar is split into two phases. In the lecture phase, we build a database system from scratch. We develop basic concepts of modern databases step by step, with a focus on in-memory technology and accompanied by practical programming exercises. In the project phase, we concentrate on a single, more complex concept, which we will implement in our research DBMS Hyrise (open source). In both phases, you will work in teams of three students.

In the lecture phase (April 20 to June 8), fundamental database components (storage, compression, operators) are presented and eventually implemented by each group independently. Getting started is eased by given interfaces and test cases. We deepen our understanding of C++, the overall architecture, and database knowledge with code reviews and by discussing different implementation ideas.

We work on a larger code base in the project phase (starting June 8). We implement new components advised by the teaching staff, which will give you feedback during weekly meetings. Here, we focus on different advanced optimization techniques. We can profit from existing test cases (>90% test coverage) and benchmark suites. Since the complexity of our database system impedes getting a complete overview of the code base within a few weeks, we focus on isolating achievable problems and providing them as project topics. Thus, you can experience to familiarize yourself with a complex system with the help and advice of the teaching staff.

This seminar is a perfect choice if you want to

  • understand how modern main-memory databases work,
  • collect experience in the development of complex and high-performant systems,
  • improve your C++-20 knowledge, and
  • work in a smaller group as part of a large open-source software project.

This seminar also serves as a solid basis for ongoing work and research on database topics, e.g., in a master's project or thesis.

Seminar Structure

The seminar will be held in presence.

  • Lecture phase (April 20 to June 8):
    • Sprint 1: Basic table functionality, e.g., column-based data management of different data types
    • Sprint 2: Dictionary encoding
    • Sprint 3: Table scan as the first database operator
    • Sprint 4: Introduction to Hyrise
  • Project phase (starting June 8): Implementation of selected components (not yet final), such as
    • maintenance of data dependencies for dynamic data,
    • enabling another SQL functionality for analytic purposes, or
    • development of an efficient encoding for variable-width string segments.
  • Final presentation (planned for August 3)

Prerequisites

  • Basic knowledge of (modern) C++.
  • A fundamental understanding of databases (DBS I) is advantageous but optional.

Teaching Form and Learning Process

  • Lectures for fundamental introduction to the implementation of database concepts and required C++ knowledge
  • Group-based software project
  • Weekly meetings with your advisor

Grading and Deliverables

  • Programming tasks
    • Sprint implementations
    • Code review of other group’s sprint implementations
    • Project Implementation
    • Code review of other group’s project implementation
  • Project presentation
  • Active participation in seminar meetings

Criteria for programming tasks are, besides functionality:

  • Code quality
  • Performance
  • Test coverage

Learning Goals

After attending this course, students should

  • Have a deep understanding of fundamental concepts of modern databases,
  • Be able to familiarize themselves with a complex code base and implement components in it,
  • Have an intuition for designing performance experiments and the ability to interpret, analyze, and discuss the results, and
  • Have gained experience in the application of modern C++ concepts.

Course Material

All materials will be available in a Moodle course.

Literature

Hasso Plattner. A Course in in-Memory Data Management: The Inner Mechanics of in-Memory Databases, Second Edition (2014). DOI: 10.1007/978-3-642-55270-0.

Enrollment

The seminar is designed to allow at least six (instead of three) groups, with three members each working on disjunct projects. Thus, the seminar may be limited to 18 students (increased limit!) to guarantee appropriate supervision. However, in the last semesters, all interested students could participate. If you are interested, please enroll until April 26 (Moodle). This way, you can form groups by the second meeting, and nobody has to deliver work without knowing if they can attend.