Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

Research and Implementation of Database Concepts

General Information

  • Lecturer: Dr. Michael Perscheid
  • Teaching Team: Thomas BodnerMartin Boissier
  • ECTS: 6 (graded)
  • Enrollment: 1 - 31 October 2022
  • Schedule: Mondays @ 15:15 (alternative meeting times may be scheduled with your supervisor)
    • Introduction on 17 October 2022 (Slides)
    • Topic assignments on 24 October 2022
    • Presentation on 13 March 2023
    • Documentation due by 24 March 2023
  • Room: L-1.06
  • Zoom: https://uni-potsdam.zoom.us/j/63642146364 (Passcode 39895931)
  • Specialization areas:
    • IT-Systems Engineering MA: BPET; OSIS
    • Data Engineering MA: SCAL (PO 2018); DASY (PO 2022)
    • Digital Health MA: SCAD
    • Software Systems Engineering MA: SSE-D; SSYS; DSYS

About this Seminar

Our database research seminar invites students that are interested in working on research-related topics in the area of database systems and, in particular, our research database systems Hyrise and Skyrise. An introduction is given in the Hyrise and Skyrise research papers and the open source Hyrise repository.

Example Topics

This list of topics is not exhaustive and we are happy to discuss research projects based on your previous experience and personal interests.

  • Efficient Histograms: Histograms are used in database systems to estimate cardinalities during query optimization. Improving their accuracy can thus have a significant impact on performance. Hyrise builds histograms for entire columns, making their creation rather expensive, especially for 100 GB+ data sets. In this year’s seminar, you will implement alternative histograms and sampling. The evaluation is done using well-known metrics such as the q-error and end-to-end benchmarks.
  • Extending Serverless Query Execution with Custom Code: Database systems offer their users ways to run application logic close to the data. This often happens in the form of so-called user-defined functions or aggregates (UDFs/UDAs). In this project, you will explore how to integrate UDFs into Skyrise and thereby strike a balance between simplicity, efficiency, and isolation.

Learning Process and Seminar Schedule

  1. Topics: During the first week of the lecture period, potential topics will be presented by the supervisors and chosen by the participants. The topics can be worked on alone or in groups of two.
  2. Familiarization: The participants are expected to familiarize themselves with the chosen topic and study recent publications that are provided by the supervisors.
  3. Project: Afterwards, implementations and evaluations will be conducted while participants receive guidance by the supervisors.
  4. Final Presentations of approximately 20 minutes (15 min. presentation + 5 min. Q&A) will be held at the end of the lecture period.
  5. Scientific Report: In the end, a scientific report (4-8 pages in ACM format depending on the group size) should set the targeted problem into context (challenges, motivation, and related work), document the taken approach, and present evaluations as well as learnings to answer raised research questions.

Learning Goals

Participants will deepen their understanding of data management technologies, improve their system’s development skills by working with a large existing code base. Additionally, they will gain experience in the scientific method and writing, which will serve as a preparation for their upcoming master’s theses.

Grading

  • 50% project result and presentation
  • 40% scientific report
  • 10% personal engagement

Prerequisites

  • Good knowledge of C++ and/or Python
  • Basic knowledge of database systems (e.g., DBS or TuK I lectures)
  • Former attendance of the Develop Your Own Database seminar is beneficial but not obligatory