Previous Research Projects

Machine Learning for Sales Order Fulfillment

Large enterprises and their information systems produce and collect large amounts of data related to different areas, e.g., manufacturing, finance or human resources. This data can be used to complete tasks more efficiently, automate tasks that are currently executed manually, and also generate insights in order to solve certain challenges.
Nowadays, machine learning techniques are utilized in many fields and use cases. In cooperation with the SAP Innovation Center Network, this bachelor project investigated opportunities to apply machine learning techniques to the problem of order delay prediction. The recent development and in-practice application of in-memory database technology (e.g. SAP HANA) enabled the efficient execution of these techniques on large enterprise datasets. Read More.

Contact: Johannes Huegle, Jan Kossmann, Dr. Michael Perscheid

Research Area: In-Memory Data Management on Modern Hardware

ESPBench - The Enterprise Stream Processing Benchmark

The ever increasing amount of data that is produced nowadays, from smart homes to smart factories, gives rise to completely new challenges and opportunities. Terms like "Internet of Things" (IoT) and "Big Data" have gained traction to describe the creation and analysis of these new data masses. New technologies were developed that are able to handle and analyze data streams, i.e., data arriving with high frequency and in large volume.In recent years, e.g., a lot of distributed Data Stream Processing Systems were developed, whose usage represents one way of analyzing data streams.

Although a broad variety of systems or system architectures is generally a good thing, the bigger the choice, the harder it is to choose. Benchmarking is a common and proven approach to identify the best system for a specific set of needs. However, currently, no satisfying benchmark for modern data stream processing architectures exists. Particularly when an enterprise context, i.e., where data streams have to be combined with historical and transactional data, existing benchmarks have shortcomings. The Enterprise Streaming Benchmark (ESB), which is to be developed, will attempt to tackle this issue. Read more.

Contacts: Guenter Hesse

Research Area: Enterprise Software Engineering

OKOA - Identifying Discriminant Cancer Genes

The aim of the project is to design and implement a technique that identifies markers for given clusters of cancer types. We use state-of-the-art and extended machine learning techniques to analyze genetic cancer data. We have developed an interactive explorational web application. We have imported public gene expression data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) programm and made it ready to be analyzed in our app. Try out yourself and visit http://okoa.epic-hpi.de/!

Contact: Cindy Perscheid, Milena Kraus, Dr. Michael Perscheid

Natural Language Processing in In-Memory Databases

The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications, Web pages or messages in the social media. Natural language processing (NLP) is the field of automatically processing textual documents and includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Read more.

Contact: Dr. Mariana Neves, Dr. Michael Perscheid

Research Area: Concepts of Modern Enterprise Applications

Real-time Sales Data Exploration (POS Explorer)

The POS Explorer helps retail companies to explore their sales data in real time. It supports employees to initiate and plan new promotions. Typical questions are which products should be promoted and how they will react to price changes. To achieve this, the software offers different views on the data such as the basket analysis or the week matrix. Exploring the raw POS data with sub-second response times and finding new interesting combinations of products for promotions generates helpful insights, resulting in actual business value. Read more.

Project Team: Prof. Dr. h.c. Hasso Plattner, Dr. Jens Krüger, Martin Faust, David Schwalb

Related Research Area: In-Memory Data Management for Enterprise Systems

Project Period: since Oct 2012

HANA Oncolyzer

Charité Medicine, Charité IT, SAP's Innovation Center in Potsdam, and the Enterprise Platform and Integration Concepts (EPIC) chair at the Hasso Plattner Institute (HPI) combine their competences in the research initiative "HANA Oncolyzer" to improve the IT-aided treatment of patients suffering from cancer diseases. The organizational changes in the healthcare sector require increasing support by proper IT techniques and procedures. Data needs to be available in real-time at any location on mobile devices to support efforts of researchers and medical doctors. The immense increase of knowledge about cancer requires the detailed analysis of biological and genetic mutations of cancer cells to make only these harmful cells the target of future treatments and to reduce side effects. Read more.

Project Team: Prof. Dr. h.c. mult. Hasso Plattner, Dr. Matthieu Schapranow

Related Research Area: In-Memory Data Management for Life Sciences

Project Period: started in Jul 2011

High-Performance In-Memory Genome (HIG) Project

The continuous progress in understanding relevant genomic basics, e.g. for treatment of cancer patients, collides with the tremendous amount of data, that need to be processed. For example, the human genome consists of approx. 3.2 billion base pairs resp. 3.2 GB of data. Identifying a concrete sequence of 20 base pairs within the genome takes hours to days if performed manually. Processing and analyzing genomic data is a challenge for medical and biological research that delays progress of research projects. From a software engineering point of view, improving the analysis of genomic data is both a concrete research and engineering challenge. Combining knowledge of in-memory technology and of how to perform real-time analysis of huge amount of data with concrete research questions of medical and biological experts is the aim of the HIG project. Read more.

Contact: Dr. Matthieu Schapranow, Cindy Perscheid

Research Area: Concepts of Modern Enterprise Applications

Elastic Online Analytical Processing on RAMCloud

Using shared DRAM as persistence for an in-memory DBMS

A shared-nothing architecture is state-of-the-art for deploying a distributed analytical in-memory database management system: it preserves the in-memory performance advantage by processing data locally on each node but is difficult to scale out. Modern switched fabric communication links such as InfiniBand narrow the performance gap between local and remote DRAM data access to a single order of magnitude. Based on these premises, this project introduces a distributed in-memory database architecture that separates the query execution engine and data access: this enables a) the usage of a large-scale DRAM-based storage system such as Stanford's RAMCloud and b) the push-down of bandwidth-intensive database operators into the storage system. We address the resulting challenges such as finding the optimal operator execution strategy and partitioning scheme. The project demonstrates that such an architecture delivers both: the elasticity of a shared-storage approach and the performance characteristics of operating on local DRAM. In our project we created AnalyticsDB which is a prototypical analytical query processor with a pluggable storage layer.

Project Team: Prof. Dr. h.c. mult. Hasso Plattner, Christian Tinnefeld

Related Research Area: In-Memory Enterprise Data Management

Project Period: since Oct 2010

AnalyzeD - A Virtual Design Observatory

With analyzeD we aim to create and disseminate a design project analyzer that will enable researchers beyond the HPDTRP community to conduct Design Thinking research. The application will be set up as software as a service (SaaS) solution. As a result, we remain in partial control of the ongoing research activities. That will allow HPDTRP to benefit directly, by having data access and indirectly, by being cited. The initial setup of the service will encompass the functionalities as developed for the d.store application within the last two years and is also based on generic architecture that offers simplified access and is able to handle large data sets. Beyond existing d.store functionality, analyzeD will allow us to tap into CAD log file data of various engineering projects. Equipped with this empirical data, we aim to quantitatively model and statically test Design Thinking paradigms. A strong candidate for testing is the consecutive rapid iteration paradigm. Can recurring patterns and wave-like movements in the captured design activity indicate a well-explored solution space and enhanced output quality? To our current knowledge, testing core Design Thinking assumptions with large, real life data samples would be a first in Design Thinking research.

Project Members HPI: Dr. Matthias Uflacker, Thomas Kowark

Project Members Stanford: Larry Leifer, Martin Steinert

Related Research Area: Methods for Enterprise Systems Design and Engineering

Project Period: since Sep 2010

Interactive Tactic-Board

In recent years the use of geo-spatial data increased strongly in various areas. Especially in the highly competitive sports sector new insights gained by positional information of players – tracked by camera or sensor based systems during a game – can have a major impact on the training and tactic of a team. In contrast to current applications, which focus solely on the analysis and visualization of basic metrics like the run distance or the average position of a player, the interactive tactic board enables the analysis of complex tactical patterns. For coaches and video analysts the analysis of game recordings is an important step during the preparation and post-processing of games. They extract strength and weaknesses of their teams as well as opponents by manually analyzing the video recordings of past games. Since video recordings are an unstructured data source, it is a complex and time intensive task to find specific game situations or similar patterns in the recordings. Read more.

Contact: Keven Richly, Dr. Michael Perscheid

Dynamic Aggregates Caching

The mixed database workloads of enterprise applications are comprised of short-running transactional as well as analytical queries with resource-intensive data aggregations. In this context, caching the query results of long-running aggregate queries is desirable as it increases the overall performance. In-memory databases with a main-delta architecture are optimized for a new caching mechanism for aggregate queries which is the main contribution of this ongoing research project. With the separation into main and delta storage, cached aggregates do not have to be invalidated when new data is inserted to the delta storage. Instead, we can use the cached aggregate query result and combine it with the newly added records in the delta storage. Read more.

Contact: Stephan Müller

Research Area: In-Memory Data Management for Enterprise Systems

Dynamic Tour Planning

Sales representatives (reps) maximize sales profitability by selling goods and services. They visit retail stores within an assigned territory to fully exploit the sales potential for the products of their represented company. During their visits the sales reps record and optimize product placement, install advertising displays, check products for compliance, i.e. whether the store offers them to buy, as well as out-of stock situations, and talk to the stores’ managers to improve the product representation within the store. Regularly, sales reps have to schedule their store visits for the upcoming time frame with the goal to choose the “right” stores, which are supposed to increase the sales profitability as much as possible. Until now, this planning is done manually based on static and aggregated data provided by spreadsheets from the rep’s manager. Read more.

Contact: Martin Faust, Stefan Klauck, David Schwalb, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

Environmental Monitoring

Air quality is an important factor for human quality of life. As a result, many governments are creating guidelines concerning emissions. To achieve better air quality, governments, industry and infrastructure providers have to work together to implement effective pollution management programs

A core requirement for these programs is the effective assessment of the geographical distribution of emissions and their sources. This allows environmental experts to determine the best policies and most effective locations for emission mitigating infrastructure. Read more.

Contact: Günter Hesse, Markus Dreseler, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

Flowr

Taking development workflow to the next level

Programming workflow can be disturbed by unnecessary long phases of debugging and web searches for the needed information. To reduce the duration of such phases, Flowr aims at bringing the right information directly to the developer at the right time. This helps increase developer efficiency and also reduces developer stress caused by many cognitive context switches between developing and searching for problem solutions. Flowr analyzes design- and runtime exceptions and searches for solutions and strategies in different data sources. When an exception is thrown, it is isolated from surrounding output and split into parts like type, file, and line. The exception text is then scanned for relevant tokens, such as library versions or technical details. After preprocessing, the exception is fed into several data sources via their respective APIs. Flowr then ranks the results of these queries internally, taking user feedback into account. It then comprises a distraction-free result page, sorted by relevance. This ranked list is shown to the developer who then decides which solution matches the problem best and rates results based on their usefulness. Read more.

Contact: Ralf Teusner

Research Area: Methods for Enterprise Systems Design and Engineering

Global Availability-To-Promise

Global Availability-To-Promise (ATP) provides necessary information about the availability of various products. Global in this context describes the need to gather information from heterogeneous enterprise-wide systems. The ATP check describes the process step which is involved when a customer queries the availability of a certain product. A reasonable feedback of this check must be processed rather in real-time than in batch mode. A typical use case is an online store, which shows the amount of available products selected by a customer. Once the availability for a product is gueranteed and the customer decides to place a sales order the Order-To-Cash Scenario is triggered. Read more.

Contact: Dr. Matthieu Schapranow

GoRFID

High Performance Discovery Service and Information Services for the EPCglobal Network

The GoRFID Project is a strategic SAP project, which targets the development and evaluation of performance critical applications within the EPCglobal Network Architecture. Visibility and real-time awareness are the two major use cases for the implementation of RFID in supply chain management. Objects, assigned with unique identifiers (Electronic Product Codes, EPCs) travel from production facilities to the consumers, producing data, highly relevant for most supply chain processes. Such data need to be stored and exchanged among different, potentially independent, supply chain parties. Read more.

Contact: Martin Lorenz

Research Area: In-Memory Data Management for Enterprise Systems

HANA Load Simulator

The HANA Load Simulator creates a realistic enterprise workload of thousands of concurrent users and executes that workload on different database configurations simultaneously. A dashboard monitors several performance indicators of each database, incl. data footprint, transaction latencies, throughput, and overall CPU utilization. The dashboard can also be used to configure several workload parameters like OLTP and OLAP query frequencies or the ratio of actual and historical queries. This provides a simple and interactive tool to assess key performance characteristics of different database setups (e.g., single- vs. multi-node) side-by-side and in real-time. Read more.

Contact: Martin Boissier, Carsten Meyer, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

HANA Oncolyzer

Help to improve cancer treatments by real-time analysis of medical data

Charité Medicine, Charité IT, SAP's Innovation Center in Potsdam, and the Enterprise Platform and Integration Concepts (EPIC) chair at the Hasso Plattner Institute (HPI) combine their competences in the research initiative "HANA Oncolyzer" to improve the IT-aided treatment of patients suffering from cancer diseases. The improved knowledge about tumor physiognomy and about active medical ingredients will successfully support cancer treatments. As a result, cancer therapies will be more accurately adjustable for individual patients and cancer forms to improve healing evidently. Read more.

Contact: Dr. Matthieu Schapranow

Research Area: In-Memory Data Management for Life Sciences

HPI Business Simulator

Today’s reporting offers unprecedented flexibility. Companies can dive into their data, filter for criteria, and drill down into hierarchies to explore their data live and on line item level. Companies wish to exploit this flexibility not only for reporting but also for forecasting and simulation. They want to define potential future scenarios and calculate how these influence their businesses. Exploiting SAP HANA and our Aggregate Cache technology, what-if analyses can be modeled and run efficiently by means of interactively defined simulation scenarios that are calculated on the fly. In that way, the analyses can not only support the monthly budgeting process but also day-to-day decision-making and simplified planning. Read more.

Contact: Stefan Klauck, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

In-Memory Real-Time Energy Management

This Bachelor Project is a cooperation between the SAP Innovation Center Potsdam and the Chair of Prof. Dr. h.c. Hasso Plattner. It focuses on the real-time evaluation and processing of huge amounts of data that arise from smart grids, both for enterprises as well as customers since smart homes and smart industries leverage great possibilities for the existing challenges in the energy business. In-memory column store technology allows us to process the huge amount of data in real time. Read more.

Contact: Dr. Matthieu Schapranow

Research Area: In-Memory Data Management for Enterprise Systems

InterMobilyzer

Intermodal Mobility using In-Memory Databases

The usage of electric vehicles is especially attractive for people living in urban areas. Those people often only have to drive short distances and are able to charge their electric vehicles at home. Thus, the limited travel distance does not negatively affect the overall comfort of owning an electric vehicle vs. using a normal car. Nevertheless, in larger cities like Berlin, the range provided by one charging cycle might not be enough for one day. For drivers of electrical vehicles it is more complicated if they need to recharge their vehicle during a trip, as they may require up to multiple hours for recharging their vehicle. This project has built a prototype to make it more comfortable to drive an EV, even when recharging is required. Read more.

Contact: Christian Schwarz

MediTweet

MediTweet is an open messaging system for clinical environments. It connects Clinical Information Systems (CIS) with both medical devices and personnel. With MediTweet the users are enabled to send structured messages to other users in order to automate documentation and task synchronization. Additionally, users are automatically informed about medical results of patients by medical devices. Read more.

Contact: Martin Boissier, Carsten Meyer, Stefan Klauck, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Life Sciences

PopulAid

PopulAid is a tool to generate customized data for application testing. Via a convenient web interface, developers can easily pick their database schemas, assign generators to columns, and get immediate previews of potential results. In doing so, generators consider not only specific value properties for one column such as the data type, ranges, data pools, distributions, or the number of distinct values, but also keep foreign keys, allow for pattern evaluation and fulfill dependencies for column combinations. PopulAid allows developers to create data in a scalable and efficient manner by applying these generators to SAP HANA. Read more.

Contact: Ralf Teusner

Research Area: Methods for Enterprise Systems Design and Engineering

Real-time Sales Data Exploration (POS Explorer)

Contact: Martin Faust, David Schwalb, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Enterprise Systems

SORMAS

In October 2014, the Helmholtz Center for Infectious Diseases, Robert Koch Institute, Bernhard Nocht Institute, Nigeria Field Epidemiology and Laboratory Training Program (NFELTP), Hasso Plattner Institute, and SAP consolidated their efforts and expertise in an interdisciplinary committee to build the Surveillance Outbreak and Response Management System (SORMAS), a management tool to support identifying emerging infections and suspected cases as well as their contacts and leveraging immediate information exchange between all involved parties of outbreak control. In order to meet the specific technical requirements of West African countries, SORMAS consists of applications for both desktop PCs and Android smartphones that are connected to a central data management platform. Read more.

Contact: Cindy Perscheid, Dr. Matthieu-P. Schapranow, Dr. Matthias Uflacker

Research Area: In-Memory Data Management for Life Sciences

Predictive Analytics on In-Memory Databases

For manufacturers it is important to have an accurate demand forecast for their products in order to avoid over or under capacity in their stores. In case of Vendor-Managed-Inventory the manufacturer is solely responsible for filling the shelfs inside the retail stores. Point-of-Sale (POS) data is one of the most important basis for forecasting. However, for different reasons, many shops cannot provide this kind of data. Instead of using imprecise shipment forecasting, new approaches have to be evaluated. Read more.

Contact: Christian Schwarz

Research Area: In-Memory Data Management for Enterprise Systems