Relational in-memory database systems achieve a high query processing performance by storing all their data in DRAM, which provides a lower data access latency than disks. However, DRAM is still relatively expensive compared to other storage technologies such as modern SSDs. Therefore, for cost-effectiveness and to avoid potential DRAM capacity limitations, we may want to store some parts of the data on secondary storage devices, resulting in larger-than-memory database systems. Two common approaches for implementing larger-than-memory databases are either having a buffer manager or using memory-mapped file I/O, e.g., via the OS-provided mmap command.
For small and mid-size data sets, the performance of Hyrise is competitive with that of comparable systems such as MonetDB, DuckDB, HyPer, and Umbra. Now we want to move toward processing terabytes of data. In this project, we will extend our database system Hyrise from a pure main memory to a larger-than-memory database system using the memory-mapped file I/O approach. After you have been introduced to the most important components of Hyrise by your supervisors and have familiarized yourself with the codebase, we will first focus on implementing a mechanism to persist table data on SSDs and load the stored data into the main memory efficiently. Second, we will evaluate different libraries for memory-mapped file I/O, including their page fault handling, to identify a particularly well-suited library for the targeted database workloads.
We aim for results that can be integrated into the main code base and push forward the open-source Hyrise project. After this project, there will be research and engineering opportunities to dive deeper into identified issues in the form of student assistantships, master’s theses, and Ph.D. positions.