Develop Your Own Database

Overview

Develop Your Own Database (DYOD) is a project seminar for master's student. Team of students first build a simple database system from scratch. Later, the teams gain practical experience in a large codebase and core database internals when implementing an efficient parallel operator in the open source database system Hyrise.

This project seminar is split into two phases, a lecture and a project phase.

In the lecture phase (April to June), we build a simple database from scratch and teach fundamental database components (e.g., storage, compression, operators), with a focus on in-memory technology. Those components are eventually implemented by each group independently in practical programming exercises. Getting started is eased by given interfaces and test cases. We deepen your understanding of C++, the overall architecture, and database knowledge with code reviews and by discussing different implementation ideas.

In the project phase (starting June), we concentrate on a single, more complex concept, which we will implement in our open-source research DBMS Hyrise. Each group will implement a aggregation operator. As this operator is an integral part of Hyrise’s execution engine, you need to get familiar with the internals of a complex software system. We offer weekly consultations with the teaching staff in that period. You can profit from existing test cases (>90% test coverage) and benchmark suites.

Aggregate Operator

The task of the group project is the implementation of the aggregate operator. The aggregate operator is, e.g., used to process requests such as “calculate the sum of orders, grouped by month and year”.

To evaluate each group's aggregate operator, we continuously benchmark each commit on a large multi-core server. A leaderboard will show each group’s performance.

General Information

  • ECTS: 6 (graded)
  • Schedule: Weekly meetings on Tuesdays, from 9.15 to 10.45
  • Room:
    • Weekly meetings will take place in L-1.06
    • Please note: the first meeting (March 14) has been relocated to F-E.06
  • Enrolment Deadline: 30 April 2026. You can deregister until May 6.
  • Modules:
    • CS: I Data and AI (Deep Dive, Specialization), III Systems (Deep Dive, Specialization)
    • DE: Data Systems (DASY - K,T,S), Systems Engineering (SYSE - K, T, S)
    • IT-SE: Operating Systems and Information Systems Technology (OSIS - K, T, S), Software Architecture and Modeling Technology (SAMT - K, T, S)
    • SSE: Software Systems (SSYS - C, T, S), Data-Driven Systems (DSYS - C, T, S)
  • Teaching Form: Seminar / Exercise
  • Enrolment Type: Compulsory Elective Module
  • Course Language: English
  • Maximum number of participants: 18
  • Introductory Lecture: Presentation slides

 

Goals of this Seminar

  • Understanding how modern main memory-optimized databases work
  • Gathering experience in the development of complex and high-performance systems
  • Improvement of your C++ knowledge
  • Working in smaller groups as part of a large open-source software project

This seminar also serves as a solid basis for ongoing work and research on database topics, e.g., in a master's project or thesis.

 

Seminar Structure

The seminar will be held in presence.

  • Lecture phase (Mid April to mid June):
    • Sprint 1: Basic table functionality, e.g., column-based data management of different data types
    • Sprint 2: Dictionary encoding
    • Sprint 3: Table scan as the first database operator
    • Sprint 4: Introduction to Hyrise
  • Project phase (starting mid June):
    • Implementation of aggregate operator in Hyrise
    • We ask you to incorporate code reviews until the end of the semester (discussed in first meeting)
    • We are planning three short group presentations during the project phase

 

Teaching Form and Learning Process

  • Lectures for fundamental introduction to the implementation of database concepts and required C++ knowledge
  • Group-based software project
  • Weekly meetings with your advisor

 

Grading and Deliverables

Over the course of DYOD, each student can earn up to 100 points. The final grade depends entirely on the number of points achieved. There are no exams. To pass the course, groups need to pass the sprints and the project phase. To pass the sprints, all basic tests need to pass. To pass the project, all tests need to pass.

  • Programming tasks (85 points)
    • Sprint implementations (35 points)
    • Code review of other group’s sprint implementations (5 points)
    • Project Implementation (40 points)
    • Code review of other group’s project implementation (5 points)
  • Project presentations (15 points)
    • During the project phase, each group needs to hold ten minute intermediate presentations (30.6., 21.7., 11.8; 5 points each)
  • Bonus points
    • Groups can get bonus points for the project implementation's performance
    • 1 bonus point for each group whose project operator beats the Hyrise baseline
    • 2 bonus points for the best-performing team, 1 bonus points for the second-best performing team

Criteria for programming tasks are, besides functionality:

  • Code quality
  • Performance
  • Test coverage

Requirements

  • Basic knowledge of (modern) C++, experience in writing high-performance C++ code is advantageous
  • A fundamental understanding of databases (DBS I) is advantageous but optional