Research and Implementation of Database Concepts

General Information

Teaching staff: Thomas Bodner, Martin Boissier, Markus Dreseler, Jan Koßmann, Dr. Michael Perscheid
4 Semesterwochenstunden (SWS) - 6 ECTS (graded)
First meeting: 02 Nov 2020
Time: Monday 15:15
Specialization areas:
- ITSE: BPET, OSIS, SAMT, ITSE-Analyse, ITSE-Maintenance
- DATA: Scalable Data Systems
Maximum of 16 students
Introduction Slides

About this Seminar

Our database research seminar invites students that are interested in working on research-related topics in the area of database systems and, in particular, our research database systems Hyrise and Skyrise. An introduction is given in the Hyrise and Skyrise research papers and the open source Hyrise repository.

Logistics

In the first meeting, we will introduce the instructors and present the different topics.
The first meeting will be held online.
Following meetings will be held in the different groups. Depending on the preferences of you and your instructor, these can be on- or offline.

Example Topics

This list of topics is not exhaustive and we are happy to discuss research projects based on your previous experience and personal interests.

What-If Optimizer: Query optimizers aim at generating the most efficient execution plan for declarative queries based on the underlying data and configuration, e.g., indexes. What-if optimizers fulfill the same task but instead, they consider hypothetical, non-existing configurations and data distributions. The returned information is vital for self-driving database systems that adjust their configuration autonomously.
Histograms: Histograms are indispensable for accurate cardinality estimations in database systems. But at the same time, they can be expensive to create and update. We will look into local histograms which are persisted on disk (allowing faster data loads & recoveries) and which can be merged (while retaining accuracy) and updated.
Tracking Memory Allocations: In-memory databases need to carefully manage their memory resources. While system profilers such as perf or vTune help us in understanding where memory is allocated, they lack understanding of the semantic level (i.e., which table it was allocated for). By tracking memory allocations directly in the application using polymorphic memory resources, we can enrich them with additional context information. This helps us in tracking down memory waste as well as optimizing the resource allocation in scenarios where DRAM capacity limits are reached.
Cost-Performance Tradeoffs in Query Execution on Cloud Functions: To enable Skyrise's prospective query optimizer to trade off cost and performance of queries, we introduce related degrees of freedom into its execution engine. We allow for pre-provisioning of cloud functions to avoid coldstart latencies. We support interleaved materialized execution to reduce the impact of stragglers. And finally, we add a staged data exchange operator for reduced parallelism and storage request cost.
Object Metadata Management for Cloud Storage: Cloud object storage systems, such as Amazon S3, can cost-efficiently store terabytes to petabytes of data in thousands to millions of objects. They, however, provide only weak data consistency guarantees, simplistic data access APIs, and poor request latencies. To enable effective and efficient relational query processing on top of these cloud object stores, we design and implement a table format for Skyrise on top of commonly used columnar file formats, such as Apache ORC, that supports concurrency control, fast statistics lookups, and data pruning.

Learning Goals

Participants will deepen their understanding of data management technologies, improve their system’s development skills by working with a large existing code base. Additionally, they will gain experience in the scientific method and writing, which will serve as a preparation for their upcoming master’s theses.

Seminar Schedule

Topics: During the first week of the lecture period, potential topics will be presented by the supervisors and chosen by the participants. The topics can be worked on alone or in groups of two.
Familiarization: The participants are expected to familiarize themselves with the chosen topic and study recent publications that are provided by the supervisors.
Project: Afterwards, implementations and evaluations will be conducted while participants receive guidance by the supervisors.
Final Presentations of approximately 20 minutes (15 min. presentation + 5 min. Q&A) will be held at the end of the lecture period.
Scientific Report: In the end, a scientific report (4-8 pages (depending on the group size) in IEEE format) should set the targeted problem into context (challenges, motivation, and related work), document the taken approach, and present evaluations as well as learnings to answer raised research questions.

Prerequisites

Good knowledge of C++ and/or Python
Basic knowledge of database systems (e.g., DBS or TuK I lectures)
Former attendance of the Develop Your Own Database seminar is beneficial but not obligatory

Grading

50% project result and presentation
40% scientific report
10% personal engagement

Research and Implementation of Database Concepts

General Information

About this Seminar

Logistics

Example Topics

Learning Goals

Seminar Schedule

Prerequisites

Grading

News

22.09.2023 | Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024

22.05.2023 | Christopher Hagedorn Successfully Defended His PhD Thesis

03.03.2023 | Last Trends and Concepts course of Prof. Hasso Plattner

01.03.2023 | Jan Kossmann Successfully Defended His PhD Thesis

26.02.2023 | Paper on Data Tiering in Hyrise Published in BTW Proceedings

24.02.2023 | Paper on EPIC Research Group Published in SIGMOD Record

30.11.2022 | Paper on Database Optimizations for Spatio-Temporal Data published in PVLDB

04.10.2022 | Günter Hesse Successfully Defended His PhD Thesis

08.07.2022 | Successful PhD Defense by Markus Dreseler

Literature

Contact