Our team is giving a series of lectures and seminars with a focus on enterprise systems design and in-memory data management. Strong links to the industry ensure a close connection between theory and its implementation in the real world.

If you are having questions regarding one of our publications, please contact the authors.

Master's Project: Parallelization and Query Plan Optimizations for the TPC-DS benchmark

Description

For relational databases, new features are quite rare. SQL has been largely unchanged for years, so all that users care about is more performance for less cost. Typically, standardized benchmarks are used to compare the throughput of two competing database systems or that of an old and a new version of a database. One of these benchmarks is the TPC-DS benchmark, which simulates queries as seen in decision support systems. Compared to the TPC-H benchmark, which we already support, the TPC-DS poses more challenges as queries are more complex and the input data is skewed.

In this project, we will take the TPC-DS benchmark as a yardstick for improving our own database, Hyrise. The focus will be (1) on improving the scalability of the system, i.e., using additional CPU cores as efficiently as possible and (2) on optimizing the query plans so that more efficient execution paths are chosen.

As we already have a benchmark framework in place, it will be a matter of days before we can look at first performance numbers. From there, we can track our improvements and will have measurable successes early in the project.

We will not perform any “throw-away work”, but aim for results that can be integrated into the main code base and will improve the overall project. After this project, there will be opportunities to dive deeper into identified issues as part of Master’s theses.

Goals

A number of goals can be addressed independently. We will select goals depending on the number of students, their interests, and our progress during the project:

Implement the TPC-DS benchmark (generating data, parameterizing queries)
Add SQL features that might be missing for the execution of all TPC-DS queries
Analyze the query plans generated for the TPC-DS and teach the optimizer to generate more efficient plans
Identify control flow inefficiencies and improve the operators to a point where we are entirely memory-bound
Improve the scheduler and the placement of data on different NUMA nodes in order to increase the TPC-DS throughput in systems with many cores

Existing Infrastructure

We will not waste any time in setting up dependencies, as most of the setup needed for this project already exists. This includes

a database system that can execute most queries out of the box,
a benchmark framework, which automatically executes queries in parallel, tracks their execution time and reports the results,
scripts for comparing multiple benchmark runs, e.g., for tracking the improvement made by a commit; as well as scripts that plot the throughput with varying number of CPUs or the improvement over time, and
a code base with a high degree of test coverage (>90%) and a CI server that enforces code quality for all pull requests

Learning Goals

During this project, you will gain insights into how databases execute complex queries. This will be helpful even if you do not plan to continue building your own database, because it also enables you as a user of databases to write more efficient SQL queries.

Furthermore, you will get a better understanding of multithreading both on a single processor and on systems with up to 16 CPUs and 480 logical cores.

Finally, as you will have to work with existing components such as the query optimizer and the scheduler, you will learn to familiarize yourself with an existing code base.

Prerequisites

Prior understanding of the fundamentals of databases (e.g., from the Datenbanksysteme lecture, the Trends and Concepts online class, or the Develop your own Database seminar) is expected as well as knowledge of C++.

Contact

You are welcome to contact one of us via mail or visit us in the villa.

Matthias Uflacker, Markus Dreseler, Jan Koßmann

News

22.09.2023 | Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024

Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024 > Zum Artikel

22.05.2023 | Christopher Hagedorn Successfully Defended His PhD Thesis

Christopher Hagedorn Successfully Defended His PhD Thesis > Zum Artikel

03.03.2023 | Last Trends and Concepts course of Prof. Hasso Plattner

After more than 20 years of teaching, our founder and benefactor Prof. Hasso Plattner visited the HPI this week for his … > Zum Artikel

01.03.2023 | Jan Kossmann Successfully Defended His PhD Thesis

Last week, Jan Kossmann another PhD student of our EPIC group successfully defended his thesis on the topic of … > Zum Artikel

26.02.2023 | Paper on Data Tiering in Hyrise Published in BTW Proceedings

Our latest paper on data tiering in Hyrise "Workload-Driven Data Placement for Tierless In-Memory Database Systems" by … > Zum Artikel

24.02.2023 | Paper on EPIC Research Group Published in SIGMOD Record

Our report “Enterprise Platform and Integration Concepts Research at HPI” has been published in the December issue of … > Zum Artikel

30.11.2022 | Paper on Database Optimizations for Spatio-Temporal Data published in PVLDB

Our paper “Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems” has been published in … > Zum Artikel

04.10.2022 | Günter Hesse Successfully Defended His PhD Thesis

Last week, Günter Hesse another PhD student of our EPIC group successfully defended his thesis on the topic of "A … > Zum Artikel

08.07.2022 | Successful PhD Defense by Markus Dreseler

Markus Dreseler has successfully defended his PhD thesis on Automatic Tiering for In-Memory Database Systems. > Zum Artikel

Literature

"A Course in In-Memory Data Management" by Prof. Dr. h.c. Hasso Plattner. This book is the culmination of six years work of in-memory research. As such, it provides the technical foundation for combined transactional and analytical workloads inside one single database as well as examples of new applications that are now possible given the availability of the new technology. The book is available at Springer.

Contact

Dr. Michael Perscheid

Chair Representative

Tel.: +49 (331) 5509-566

E-Mail: michael.perscheid(at)hpi.de

Office:

Room: V-2.12

Tel.: +49 (331) 5509-560

Fax: +49 (331) 5509-579

E-Mail: office-epic(at)hpi.de

Contact Details