Prof. Dr. h.c. Hasso Plattner

Projects Overview

Traditional databases are separated into ones for current data from the day-to-day business processes and ones for reporting and analytics. For fast moving businesses, moving data from one silo to another is cumbersome and takes too much time. As a result, the new data arriving in the reporting system is already old by the time it is loaded. HYRISE proposes a new way to solve this problem: It analyzes the query input and reorganizes the stored data in different dimensions.  In detail, HYRISE partitions the layout of the underlying tables in a vertical and horizontal manner depending on the input to this layout management component. The workload is specified as a set of queries and weights and is processed by calculating the layout dependent costs for those queries. Based on our cost-model we can now calculate the best set of partitions for this input workload. This optimization allows great speed improvements compared to traditional storage models. Read More.

Contact: Markus Dreseler, Jan KossmannMartin Boissier, Stefan Klauck, Dr. Michael Perscheid,  Prof. Dr. h.c. Hasso Plattner

Research Area: In-Memory Data Management on Modern Hardware

Data-Driven Causal Inference

The emergence of the Internet of Things (IoT) allows for a comprehensive analysis of industrial manufacturing processes. While domain experts within the company have enough expertise to identify the most common relationships, they will require support in the context of both, an increasing amount of observational data and the complexity of large systems of observed features. This gap can be closed by machine learning algorithms of causal inference that derive the underlying causal relationships between the observed features. Read More.

Contact: Johannes Huegle, Christopher HagedornDr. Rainer Schlosser, Dr. Michael Perscheid

Research Area: Data-Driven Decision Support

Modern e-commerce platforms pose both opportunities as well as hurdles for merchants. While merchants can observe markets at any point in time and automatically reprice their products, they also have to compete simultaneously with dozens of competitors.

Our platform enables analyses of how a strategy's performance is affected by customer behavior, price adjustment frequencies, the competitors' strategies, and the exit/entry of competitors.We compared traditional rule-based strategies with simple data-driven strategies. We find that data-driven merchants are superior to rule-based approaches as soon as a sufficiently large data set has been gathered. Read More.

Contact: Dr. Rainer Schlosser, Martin Boissier

Large enterprises and their information systems produce and collect large amounts of data related to different areas, e.g., manufacturing, finance or human resources. This data can be used to complete tasks more efficiently, automate tasks that are currently executed manually, and also generate insights in order to solve certain challenges.
Nowadays, machine learning techniques are utilized in many fields and use cases. In cooperation with the SAP Innovation Center Network, this bachelor project investigated opportunities to apply machine learning techniques to the problem of order delay prediction. The recent development and in-practice application of in-memory database technology (e.g. SAP HANA) enabled the efficient execution of these techniques on large enterprise datasets. Read More.

Contact: Johannes Huegle, Jan Kossmann, Dr. Michael Perscheid

Research Area: In-Memory Data Management on Modern Hardware

The ever increasing amount of data that is produced nowadays, from smart homes to smart factories, gives rise to completely new challenges and opportunities. Terms like "Internet of Things" (IoT) and "Big Data" have gained traction to describe the creation and analysis of these new data masses. New technologies were developed that are able to handle and analyze data streams, i.e., data arriving with high frequency and in large volume.In recent years, e.g., a lot of distributed Data Stream Processing Systems were developed, whose usage represents one way of analyzing data streams.

Although a broad variety of systems or system architectures is generally a good thing, the bigger the choice, the harder it is to choose. Benchmarking is a common and proven approach to identify the best system for a specific set of needs. However, currently, no satisfying benchmark for modern data stream processing architectures exists. Particularly when an enterprise context, i.e., where data streams have to be combined with historical and transactional data, existing benchmarks have shortcomings. The Enterprise Streaming Benchmark (ESB), which is to be developed, will attempt to tackle this issue. Read more.

Contacts: Guenter Hesse

Research Area: Concepts of Modern Enterprise Applications

The aim of the project is to design and implement a technique that identifies markers for given clusters of cancer types. We use state-of-the-art and extended machine learning techniques to analyze genetic cancer data. We have developed an interactive explorational web application. We have imported public gene expression data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) programm and made it ready to be analyzed in our app.  Try out yourself and visit http://okoa.epic-hpi.de/!

Contact: Cindy PerscheidMilena Kraus, Dr. Michael Perscheid

The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications, Web pages or messages in the social media. Natural language processing (NLP) is the field of automatically processing textual documents and includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Read more.

Contact: Dr. Mariana Neves, Dr. Michael Perscheid

Research Area: Concepts of Modern Enterprise Applications

The continuous progress in understanding relevant genomic basics, e.g. for treatment of cancer patients, collides with the tremendous amount of data, that need to be processed. For example, the human genome consists of approx. 3.2 billion base pairs resp. 3.2 GB of data. Identifying a concrete sequence of 20 base pairs within the genome takes hours to days if performed manually. Processing and analyzing genomic data is a challenge for medical and biological research that delays progress of research projects. From a software engineering point of view, improving the analysis of genomic data is both a concrete research and engineering challenge. Combining knowledge of in-memory technology and of how to perform real-time analysis of huge amount of data with concrete research questions of medical and biological experts is the aim of the HIG project. Read more.

Contact: Dr. Matthieu Schapranow, Cindy Perscheid

Research Area: Concepts of Modern Enterprise Applications

In recent years the use of geo-spatial data increased strongly in various areas. Especially in the highly competitive sports sector new insights gained by positional information of players – tracked by camera or sensor based systems during a game – can have a major impact on the training and tactic of a team.  In contrast to current applications, which focus solely on the analysis and visualization of basic metrics like the run distance or the average position of a player, the interactive tactic board enables the analysis of complex tactical patterns. For coaches and video analysts the analysis of game recordings is an important step during the preparation and post-processing of games. They extract strength and weaknesses of their teams as well as opponents by manually analyzing the video recordings of past games. Since video recordings are an unstructured data source, it is a complex and time intensive task to find specific game situations or similar patterns in the recordings. Read more.

Contact: Keven Richly, Dr. Michael Perscheid

Project Archive

Find a list of previous projects here.