Prof. Dr. h.c. Hasso Plattner

Projects Overview


Traditional databases are separated into ones for current data from the day-to-day business processes and ones for reporting and analytics. For fast moving businesses, moving data from one silo to another is cumbersome and takes too much time. As a result, the new data arriving in the reporting system is already old by the time it is loaded. HYRISE proposes a new way to solve this problem: It analyzes the query input and reorganizes the stored data in different dimensions.  In detail, HYRISE partitions the layout of the underlying tables in a vertical and horizontal manner depending on the input to this layout management component. The workload is specified as a set of queries and weights and is processed by calculating the layout dependent costs for those queries. Based on our cost-model we can now calculate the best set of partitions for this input workload. This optimization allows great speed improvements compared to traditional storage models.

Contact: David Schwalb

Research Area: In-Memory Data Management for Enterprise Systems

Natural Language Processing in In-Memory Databases

The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications, Web pages or messages in the social media. Natural language processing (NLP) is the field of automatically processing textual documents and includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Read more.

Contact: Dr. Mariana NevesDr. Matthias Uflacker

HANA Load Simulator

Screenshot of the running HANA Load Simulator

The HANA Load Simulator creates a realistic enterprise workload of thousands of concurrent users and executes that workload on different database configurations simultaneously. A dashboard monitors several performance indicators of each database, incl. data footprint, transaction latencies, throughput, and overall CPU utilization. The dashboard can also be used to configure several workload parameters like OLTP and OLAP query frequencies or the ratio of actual and historical queries. This provides a simple and interactive tool to assess key performance characteristics of different database setups (e.g., single- vs. multi-node) side-by-side and in real-time. Read more.

Contact: Martin Boissier, Carsten MeyerDr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

Environmental Monitoring

Air quality is an important factor for human quality of life. As a result, many governments are creating guidelines concerning emissions. To achieve better air quality, governments, industry and infrastructure providers have to work together to implement effective pollution management programs

A core requirement for these programs is the effective assessment of the geographical distribution of emissions and their sources. This allows environmental experts to determine the best policies and most effective locations for emission mitigating infrastructure. Read more.

Contact: Günter Hesse, Markus DreselerDr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

HPI Business Simulator

Today’s reporting offers unprecedented flexibility. Companies can dive into their data, filter for criteria, and drill down into hierarchies to explore their data live and on line item level. Companies wish to exploit this flexibility not only for reporting but also for forecasting and simulation. They want to define potential future scenarios and calculate how these influence their businesses. Exploiting SAP HANA and our Aggregate Cache technology, what-if analyses can be modeled and run efficiently by means of interactively defined simulation scenarios that are calculated on the fly. In that way, the analyses can not only support the monthly budgeting process but also day-to-day decision-making and simplified planning. Read more.

Contact: Stefan KlauckDr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

Dynamic Aggregates Caching

The mixed database workloads of enterprise applications are comprised of short-running transactional as well as analytical queries with resource-intensive data aggregations. In this context, caching the query results of long-running aggregate queries is desirable as it increases the overall performance. In-memory databases with a main-delta architecture are optimized for a new caching mechanism for aggregate queries which is the main contribution of this ongoing research project. With the separation into main and delta storage, cached aggregates do not have to be invalidated when new data is inserted to the delta storage. Instead, we can use the cached aggregate query result and combine it with the newly added records in the delta storage. Read more.

Contact: Stephan Müller

Research Area: In-Memory Data Management for Enterprise Systems

High-Performance In-Memory Genome (HIG) Project

The continuous progress in understanding relevant genomic basics, e.g. for treatment of cancer patients, collides with the tremendous amount of data, that need to be processed. For example, the human genome consists of approx. 3.2 billion base pairs resp. 3.2 GB of data. Identifying a concrete sequence of 20 base pairs within the genome takes hours to days if performed manually. Processing and analyzing genomic data is a challenge for medical and biological research that delays progress of research projects. From a software engineering point of view, improving the analysis of genomic data is both a concrete research and engineering challenge. Combining knowledge of in-memory technology and of how to perform real-time analysis of huge amount of data with concrete research questions of medical and biological experts is the aim of the HIG project. Read more.

Contact: Dr. Matthieu Schapranow, Cindy Perscheid

Research Area: In-Memory Data Management for Life Sciences

Interactive Tactic-Board

In recent years the use of geo-spatial data increased strongly in various areas. Especially in the highly competitive sports sector new insights gained by positional information of players – tracked by camera or sensor based systems during a game – can have a major impact on the training and tactic of a team.  In contrast to current applications, which focus solely on the analysis and visualization of basic metrics like the run distance or the average position of a player, the interactive tactic board enables the analysis of complex tactical patterns. For coaches and video analysts the analysis of game recordings is an important step during the preparation and post-processing of games. They extract strength and weaknesses of their teams as well as opponents by manually analyzing the video recordings of past games. Since video recordings are an unstructured data source, it is a complex and time intensive task to find specific game situations or similar patterns in the recordings. Read more.

Contact: Keven RichlyDr. Matthias Uflacker

Enterprise Streaming Benchmark

The ever increasing amount of data that is produced nowadays, from smart homes to smart factories, gives rise to completely new challenges and opportunities. Terms like "Internet of Things" (IoT) and "Big Data" have gained traction to describe the creation and analysis of these new data masses. New technologies were developed that are able to handle and analyze data streams, i.e., data arriving with high frequency and in large volume.In recent years, e.g., a lot of distributed Data Stream Processing Systems were developed, whose usage represents one way of analyzing data streams.

Although a broad variety of systems or system architectures is generally a good thing, the bigger the choice, the harder it is to choose. Benchmarking is a common and proven approach to identify the best system for a specific set of needs. However, currently, no satisfying benchmark for modern data stream processing architectures exists. Particularly when an enterprise context, i.e., where data streams have to be combined with historical and transactional data, existing benchmarks have shortcomings. The Enterprise Streaming Benchmark (ESB), which is to be developed, will attempt to tackle this issue. Read more.

Contacts: Guenter HesseDr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

Project Archive

Find a list of previous projects here.