Project Overview

HYRISE: The Open-Source In-Memory Research DBMS

Traditional databases are separated into ones for current data from the day-to-day business processes and ones for reporting and analytics. For fast moving businesses, moving data from one silo to another is cumbersome and takes too much time. As a result, the new data arriving in the reporting system is already old by the time it is loaded. HYRISE proposes a new way to solve this problem: It analyzes the query input and reorganizes the stored data in different dimensions. In detail, HYRISE partitions the layout of the underlying tables in a vertical and horizontal manner depending on the input to this layout management component. The workload is specified as a set of queries and weights and is processed by calculating the layout dependent costs for those queries. Based on our cost-model we can now calculate the best set of partitions for this input workload. This optimization allows great speed improvements compared to traditional storage models. Read More.

Contact: Martin Boissier, Stefan Klauck, Keven Richly, Marcel Weisgut, Daniel Lindner, Dr. Michael Perscheid, Prof. Dr. h.c. mult. Hasso Plattner

Research Area: Autonomous Data Management

Skyrise: A Serverless Query Processor

Enterprises increasingly run applications in support of their business processes in cloud environments. With more application data residing in the cloud, analytical workloads gain importance, for which dedicated infrastructure needs to be provisioned before any query processing can begin. Resource provisioning, however, can be difficult for these workloads because they are often unpredictable and ad-hoc in nature. To avoid underprovisioning and performance disruption, conservative overprovisioning is the norm with lost cost efficiency.

Recently, cloud providers introduced means to allocate and bill fine-granular units of resources with function as a service (FaaS) platforms and shared cloud storage. We are studying this so called serverless cloud infrastructure to inform the design of our Skyrise query processing system. Skyrise embraces a FaaS-based, shared-disk distributed architecture to inherit the elasticity and resiliency characteristics of its underlying infrastructure.

Contact: Thomas Bodner, David Justen, Dr. Michael Perscheid, Prof. Dr. h.c. mult. Hasso Plattner

Research Area: Autonomous Data Management

Enable Disaggregated Memory for In-Memory Databases

In-memory databases are widely used in the software industry due to their high query processing performance. A key reason for this performance is storing the majority of the data on local DRAM, which results in low data access latencies compared to slower secondary storage devices, such as flash SSDs. However, for analytical databases, DRAM capacity is a limited resource since the size of data to be processed continuously grows. In traditional data centers, the basic unit is a monolithic server, accommodating small quantities of compute, storage, and network resources. This co-location of the different resources prevents independent scaling of individual resource types and, thus, makes it hard to fulfill the highly elastic and growing memory demands from data-intensive applications. Furthermore, scaling out either computing or storage capacity in a co-located hardware setup implies the need to over-provision the other resource type. One way to tackle these elasticity and utilization shortcomings is a composable data center architecture, which separates traditional server resources into independent compute, storage, and network pools.

Memory disaggregation is a promising technique for achieving elasticity and improving resource utilization. In a research project collaborating with Seagate Technology, we investigate how disaggregated memory can be utilized for an in-memory database to expand the local memory capacity of a database server. Disaggregated memory can be considered as an additional tier in the memory hierarchy, with certain bandwidth and latency qualities. Our aspects of interest are memory management mechanisms to efficiently store data on disaggregated memory and autonomous data placement algorithms. For our research, we use Hyrise as an exemplary in-memory database and a hardware prototype for disaggregated memory provided by Seagate Technology.

Contact: Dr. Daniel Ritter, Marcel Weisgut, Martin Boissier, Dr. Michael Perscheid

Research Area: Autonomous Data Management

Data-Driven Causal Inference

The emergence of the Internet of Things (IoT) allows for a comprehensive analysis of industrial manufacturing processes. While domain experts within the company have enough expertise to identify the most common relationships, they will require support in the context of both, an increasing amount of observational data and the complexity of large systems of observed features. This gap can be closed by machine learning algorithms of causal inference that derive the underlying causal relationships between the observed features. Read More.

Contact: Johannes Huegle, Christopher Hagedorn, Dr. Rainer Schlosser, Dr. Michael Perscheid

Research Area: Data-Driven Decision Support

Dynamic Pricing under Competition

Modern e-commerce platforms pose both opportunities as well as hurdles for merchants. While merchants can observe markets at any point in time and automatically reprice their products, they also have to compete simultaneously with dozens of competitors.

Our platform enables analyses of how a strategy's performance is affected by customer behavior, price adjustment frequencies, the competitors' strategies, and the exit/entry of competitors.We compared traditional rule-based strategies with simple data-driven strategies. We find that data-driven merchants are superior to rule-based approaches as soon as a sufficiently large data set has been gathered. Read More.

Contact: Dr. Rainer Schlosser, Martin Boissier

Research Area: Data-driven Decison Support