Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware

Summary written by Till Prochaska

In his lecture [1], Prof. Dr. Viktor Leis of the Technical University of Munich (TUM) presents his research on adopting unikernel operating systems (OSs) for Database Management Systems (DBMSs).

On an abstract level, both OSs and DBMSs implement similar functionality to manage compute, memory, I/O, and caching. In practice, however, many of the interfaces provided by OSs are poorly suited for use in DBMSs, frequently requiring developers to reimplement functionality in the DBMS.

So far, optimizing DBMS performance has often required either modifying the OS (e.g., using custom Linux kernel modules) or bypassing the kernel. Both options require significant engineering effort and are difficult to maintain, primarily due to the complexity of legacy OS interfaces, requirements to support legacy and non-standard hardware, and process isolation.

As an alternative, Leis proposes using unikernel OSs for DBMSs. Unlike traditional OSs (such as Linux), unikernels are designed to run only a single process in a single address space. This approach significantly reduces complexity and overhead, allowing for optimizations that have previously been unfeasible. As virtualization is now common due to the prevalence of cloud deployments, and Database-as-a-service (DBaaS) offerings are becoming increasingly popular, unikernels have become a promising and realistic approach for running DBMSs in the cloud.

1) The Uneasy Relationship Between Operating Systems and DBMSs

Leis presents three case studies to illustrate what he calls “the uneasy relationship between operating systems and DBMSs”.

Case Study 1: Virtual Memory

DBMSs typically cache data in memory. mmap is a Linux system call that maps files from storage into memory. Applications access a file as if it were fully loaded into memory. The OS doesn’t load the entire file into memory at once. Instead, it loads pages (fixed-size chunks of data) into memory on-demand. When a page fault (access to a page that hasn’t been loaded into memory) occurs, the OS lazily loads the page from storage. Subsequent accesses to cached pages are faster. The OS also handles page eviction (removing pages from memory to free up space). All of this is handled transparently by the OS, and the application has limited ways to control the caching behavior.

While this seems like a useful abstraction, Leis refers to research that shows that mmap has multiple issues that make it unsuitable for use in DBMSs, many of them related to mmap’s transparent nature [2]:

  1. Transactional safety. As the DBMS has no control over when pages are evicted from memory, safe transaction handling becomes more complex.
  2. IO stalls. The DBMS cannot predict whether accessing a page will be fast or result in a page fault, making optimizations (such as prefetching) impossible without workarounds.
  3. Error handling. Handling of I/O errors becomes more complex. Any code accessing memory-mapped data may raise I/O errors that need to be handled properly by DBMS developers.
  4. Performance issues. mmap is inefficient when working with larger-than-memory datasets.

While workarounds for some of these issues exist, they require careful implementation, defeating the purpose of using mmap in the first place: reducing complexity by leveraging OS-level abstractions. Due to these drawbacks, most DBMSs do not use mmap and instead implement their own user-space caching mechanisms.

Another approach to this problem is vmcache, a new design for a virtual memory interface [3]. In contrast to mmap, vmcache gives DBMSs control over page faults and eviction, while still making use of hardware-supported translation of virtual memory addresses. However, a performant implementation of this approach required implementing a custom Linux kernel module, adding significant complexity and maintenance overhead.

Case Study 2: Storage I/O

Modern SSDs are so fast that accessing them efficiently is critical. Leis presents a microbenchmark that compares I/O operations per second (IOPS) and CPU utilization for 4K-page reads from SSD storage across different I/O interfaces, including synchronous POSIX I/O (pread), asynchronous I/O (io_uring), and the Intel Storage Performance Development Kit (SPDK), a set of low-level libraries for user-space I/O which completely bypasses the kernel.

In contrast to the baseline of using pread, io_uring was able to achieve maximum IOPS, although only when most OS features (such as the file system, RAID) were disabled. Furthermore, it also caused significant CPU overhead just for the I/O, leaving little room for any actual data processing.

Compared to io_uring, using SPDK significantly reduced the CPU overhead while still achieving the same number of IOPS. The main drawback of SPDK is its low-level interfaces, i.e. developers cannot rely on OS features such as the file system.

Again, this shows that the abstractions of a traditional OS prevent fully exploiting modern hardware capabilities.

Case Study 3: Scheduling

LeanStore is a storage engine developed at TUM. As modern SSDs are highly parallel devices, DBMSs have to perform many I/O tasks in parallel to fully exploit SSD bandwidth. Because OS threads come with significant overhead, LeanStore implements its own lightweight user-space scheduling mechanism [4]. However, running entirely in user space has drawbacks compared to OS-level threading, such as the lack of preemption (a thread cannot be stopped unless it explicitly yields control back to the scheduler).

2) Models of DBMS/OS Interaction

These case studies demonstrate three different models of interaction between DBMSs and OSs.

In the standard model, the DBMS runs as an application within the OS. Today, DBMSs are often deployed in the cloud, which typically means that the OS itself runs within a virtual machine managed by a hypervisor. In some cases, DBMSs may also be containerized. Each of these layers is separated by isolation boundaries. This results in overhead due to redundant isolation and makes it difficult to fully exploit modern hardware.

In such cases, customizing the OS (e.g., using custom Linux kernel modules) can be a workaround to expose additional capabilities (such as direct hardware access) to the DBMS. However, implementing and maintaining OS customizations is difficult and, for that reason, may not be practical.

Finally, entirely bypassing the OS can unlock certain optimizations, but this also means that DBMS developers must reimplement many OS features in user space.

Leis concludes that all three approaches have significant drawbacks, but also emphasizes that these are not an inherent problem of Linux as an OS, but rather a result of legacy OS interfaces, support for legacy and non-standard hardware, and process isolation.

Figure 1: Three different models of DBMS/OS interaction

3) The Case for Unikernels

Unikernels are lightweight, single address space operating systems. Unlike traditional OSs, unikernels are designed to run only a single process, such as the DBMS. Everything runs in kernel space, allowing them to use privileged instructions without the overhead of system calls.

As a consequence, unikernels are no replacement for traditional multi-process OSs, but at the same time, this makes them particularly well suited for cloud deployments. From a security perspective, the lack of isolation between the DBMS and OS is unproblematic: In the cloud, isolation is the norm, and the isolation of different applications is ensured by the hypervisor at the virtualization level. In fact, the simplicity of unikernels reduces their attack surface, which makes testing and security analysis easier [5].

Figure 2: Unikernel/DBMS interaction compared to previous DBMS/OS interaction model

According to Leis, there are two developments that have made unikernels a realistic approach:

  1. Cloud deployments. In many cases, databases are already run in the cloud, i.e., as explained above, there is no need for a redundant isolation layer. The cloud hardware landscape is more homogenous, making it realistic for unikernels to implement support for a relevant share of hardware configurations. For example, NVMe is the de-facto standard for SSD access. While networking hardware is less standardized, large cloud providers have settled on consistent interfaces at least across their own offerings (e.g., Elastic Network Adapter is widely supported by AWS EC2 instance types).
  2. DBaaS. Users tend to be skeptical of non-standard OSs, and by extension this also applies to DBMSs requiring such a non-standard OS. With the rise of DBaaS offerings, users no longer manage the DBMS and underlying OS themselves, and thus the choice of the OS isn’t a deciding factor anymore.

4) Unikernel/DBMS Co-Design

OSv is a unikernel OS that implements much of POSIX, allowing many applications to run on OSv with little modification [6]. The lack of process isolation makes OSv’s codebase much simpler. For example, its virtual memory implementation is just 2,000 lines of code, compared to 110,000 in Linux. OSv also comes with networking and storage drivers, and can boot directly in AWS EC2 instances.

Leis emphasizes that migrating a DBMS to a unikernel such as OSv alone won’t improve performance. In fact, the opposite might be the case. However, unikernels unlock new opportunities for DBMS optimization that have previously been unfeasible due to the complexity of traditional OSs [5]. This includes optimizations in the problem areas outlined in section 1:

  • Leveraging virtual memory for caching as unikernels can provide direct control over virtual memory hardware;
  • Achieving high-performance storage I/O using direct hardware access;
  • Simplifying scheduling with preemptive and lightweight unikernel threads.

There is room for optimization in other areas as well. Similar to storage hardware, networking hardware has also become much faster. Legacy OS interfaces aren’t able to fully utilize that, leading to similar trade-offs. Beyond the OS, DBMSs could even expose information about their state to the hypervisor, enabling more effective, dynamic provisioning of memory and compute resources across multiple virtual machines.

Leis notes that many Cloud DBMSs already use a component-based architecture. These systems are particularly well suited for the unikernel approach, since the unikernel OS can be adopted only for the data-intensive components (e.g., query processing) that would profit from the optimization opportunities, while other components (such as metadata management) can continue to be run in a traditional OS.

5) Discussion

One of the fundamental principles of unikernels is their simplicity. This is what makes many of the optimization opportunities described in the previous section possible in the first place, but it also means giving up on support for non-standard or legacy hardware and interfaces. While many unikernel OSs implement a subset of POSIX interfaces, which can ease porting of existing applications, being fully backwards compatible would be contrary to the goal of simplification. Therefore, porting or developing applications as complex as modern DBMSs for unikernel OSs could pose significant challenges.

Most modern, widely adopted DBMSs have been developed for traditional OSs and battle-tested for decades in many different production scenarios by large numbers of users. While this doesn’t guarantee the absence of bugs (e.g., PostgreSQL’s “fsyncgate”), it gives traditional OSs an advantage that can, according to Leis, only be compensated for by the simplicity of the unikernel approach. The lack of security isolation between applications and the OS in unikernels is unproblematic as they are intended to host only a single application, the DBMS, and isolation between different applications is ensured at the virtualization layer.

In order to fully realize the opportunities of unikernels, merely porting DBMSs isn’t enough. Instead, Leis calls for the co-design of the OS and DBMS, designing entirely new abstractions for use with DBMSs. However, this paradigm shift could also raise the barriers for DBMS developers to commit to the approach.

Finally, the main premise for unikernels is the wide adoption of the cloud and the rise of DBaaS. While the trend towards the cloud seems unlikely to reverse, this assumption may not be applicable to all use cases and DBMSs. However, as Leis points out, virtualization has become common even in on-premise settings, although the more heterogeneous hardware landscape would still pose challenges in these cases.

But there’s another possibility for how unikernels could benefit even DBMSs that don’t adopt them directly: Once DBMS-specific abstractions developed for use with unikernel OSs have matured and proven effective, they could eventually be incorporated into traditional OSs.

6) Conclusion

At the end of his lecture, Leis reflects on his many years of research in the field of DBMS optimizations, emphasizing that all optimization approaches in a traditional OS stack suffer from the complexity and performance overhead of OS abstractions. He calls this his “personal story of radicalization”, which has eventually led to his research focus on unikernel/DBMS co-design. With the prevalence of cloud computing and the rising popularity of DBaaS, unikernel/DBMS co-design is a promising approach for Cloud DBMSs. Leis is currently exploring the topic as part of the Cumulus research project in collaboration with Technische Universität Braunschweig.

References

[1] Leis, Viktor. ”Co-Designing Cloud-Native Database Systems and Unikernels.” Lecture Series on Database Research (2025). www.tele-task.de/lecture/video/11436/

[2] Crotty, Andrew et al. “Are You Sure You Want to Use MMAP in Your Database Management System?” Conference on Innovative Data Systems Research (2022).

[3] Leis, Viktor et al. “Virtual-Memory Assisted Buffer Management.” Proceedings of the ACM on Management of Data 1 (2023): 1 - 25.

[4] Haas, Gabriel and Viktor Leis. “What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines.” Proc. VLDB Endow. 16 (2023): 2090-2102.

[5] Leis, Viktor and Christian Dietrich. “Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware.” Proc. VLDB Endow. 17 (2024): 2115-2122.

[6] Kivity, Avi et al. “OSv - Optimizing the Operating System for Virtual Machines.” USENIX Annual Technical Conference (2014).