This project seminar is split into two phases. In the lecture phase, we build a database system from scratch. We develop basic concepts of modern databases step by step, with a focus on in-memory technology and accompanied by practical programming exercises. In the project phase, we concentrate on a single, more complex concept, which we will implement in our research DBMS Hyrise (open source). In both phases, you will work in teams of three students.
In the lecture phase (April 20 to June 8), fundamental database components (storage, compression, operators) are presented and eventually implemented by each group independently. Getting started is eased by given interfaces and test cases. We deepen our understanding of C++, the overall architecture, and database knowledge with code reviews and by discussing different implementation ideas.
We work on a larger code base in the project phase (starting June 8). We implement new components advised by the teaching staff, which will give you feedback during weekly meetings. Here, we focus on different advanced optimization techniques. We can profit from existing test cases (>90% test coverage) and benchmark suites. Since the complexity of our database system impedes getting a complete overview of the code base within a few weeks, we focus on isolating achievable problems and providing them as project topics. Thus, you can experience to familiarize yourself with a complex system with the help and advice of the teaching staff.
This seminar is a perfect choice if you want to
- understand how modern main-memory databases work,
- collect experience in the development of complex and high-performant systems,
- improve your C++-20 knowledge, and
- work in a smaller group as part of a large open-source software project.
This seminar also serves as a solid basis for ongoing work and research on database topics, e.g., in a master's project or thesis.
Seminar Structure
The seminar will be held in presence.
- Lecture phase (April 20 to June 8):
- Sprint 1: Basic table functionality, e.g., column-based data management of different data types
- Sprint 2: Dictionary encoding
- Sprint 3: Table scan as the first database operator
- Sprint 4: Introduction to Hyrise
- Project phase (starting June 8): Implementation of selected components (not yet final), such as
- maintenance of data dependencies for dynamic data,
- enabling another SQL functionality for analytic purposes, or
- development of an efficient encoding for variable-width string segments.
- Final presentation (planned for August 3)
Prerequisites
- Basic knowledge of (modern) C++.
- A fundamental understanding of databases (DBS I) is advantageous but optional.
Teaching Form and Learning Process
- Lectures for fundamental introduction to the implementation of database concepts and required C++ knowledge
- Group-based software project
- Weekly meetings with your advisor
Grading and Deliverables
- Programming tasks
- Sprint implementations
- Code review of other group’s sprint implementations
- Project Implementation
- Code review of other group’s project implementation
- Project presentation
- Active participation in seminar meetings
Criteria for programming tasks are, besides functionality:
- Code quality
- Performance
- Test coverage
Learning Goals
After attending this course, students should
- Have a deep understanding of fundamental concepts of modern databases,
- Be able to familiarize themselves with a complex code base and implement components in it,
- Have an intuition for designing performance experiments and the ability to interpret, analyze, and discuss the results, and
- Have gained experience in the application of modern C++ concepts.
Course Material
All materials will be available in a Moodle course.
Literature
Hasso Plattner. A Course in in-Memory Data Management: The Inner Mechanics of in-Memory Databases, Second Edition (2014). DOI: 10.1007/978-3-642-55270-0.