Prof. Dr. h.c. Hasso Plattner


A Course in In-Memory Data Management

The Inner Mechanics of In-Memory Databases

Recent achievements in hardware and software development, such as multi-core CPUs and DRAM capacities of multiple terabytes per server, enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of enterprise data. Professor Hasso Plattner and his research group at the Hasso Plattner Institute in Potsdam, Germany, have been investigating and teaching the corresponding concepts and their adoption in the software industry for years.

This book is based on the first online course on the openHPI e-learning platform, which was launched in autumn 2012 with more than 13,000 learners. The book is designed for students of computer science, software engineering, and IT related subjects. However, it addresses business experts, decision makers, software developers, technology experts, and IT analysts alike. Plattner and his group focus on exploring the inner mechanics of a column-oriented dictionary-encoded in-memory database. Covered topics include - amongst others - physical data storage and access, basic database operators, compression mechanisms, and parallel join algorithms. Beyond that, implications for future enterprise applications and their development are discussed. Readers are lead to understand the radical differences and advantages of the new technology over traditional row-oriented disk-based databases.

View more book information

Author: Prof. Dr. h.c. Hasso Plattner

Building a Columnar Database on RAMCloud

Database Design for the Low-Latency Enabled Data Center

This book examines the field of parallel database management systems and illustrates the great variety of solutions based on a shared-storage or a shared-nothing architecture. Constantly dropping memory prices and the desire to operate with low-latency responses on large sets of data paved the way for main memory-based parallel database management systems. However, this area is currently dominated by the shared-nothing approach in order to preserve the in-memory performance advantage by processing data locally on each server. The main argument this book makes is that such an unilateral development will cease due to the combination of the following three trends: a) Today’s network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory on a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic and provide high-availability. c) A modern storage system such as Stanford’s RAM Cloud even keeps all data resident in the main memory. Exploiting these characteristics in the context of a main memory-based parallel database management system is desirable. The book demonstrates that the advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible.

View more book information

Download table of contents

Authors: Christian Tinnefeld

High-Performance In-Memory Genome Data Analysis

How In-Memory Database Technology Accelerates Personalized Medicine

Recent achievements in hardware and software developments have enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of data, such as diagnoses, therapies, and human genome data. This book shares the latest research results of applying in-memory data management to personalized medicine, changing it from computational possibility to clinical reality. The authors provide details on innovative approaches to enabling the processing, combination, and analysis of relevant data in real-time. The book bridges the gap between medical experts, such as physicians, clinicians, and biological researchers, and technology experts, such as software developers, database specialists, and statisticians. Topics covered in this book include - amongst others - modeling of genome data processing and analysis pipelines, high-throughput data processing, exchange of sensitive data and protection of intellectual property. Beyond that, it shares insights on research prototypes for the analysis of patient cohorts, topology analysis of biological pathways, and combined search in structured and unstructured medical data, and outlines completely new processes that have now become possible due to interactive data analyses.

View more book information

Download Table of Content

Authors: Prof. Dr. h.c. Hasso Plattner, Dr. Matthieu Schapranow

Benchmarking Transaction and Analytical Processing Systems

The Creation of a Mixed Workload Benchmark and its Application

Systems for Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are currently separate. The potential of the latest technologies and changes in operational and analytical applications over the last decade have given rise to the unification of these systems, which can be of benefit for both workloads. Research and industry have reacted and prototypes of hybrid database systems are now appearing.Benchmarks are the standard method for evaluating, comparing and supporting the development of new database systems. Because of the separation of OLTP and OLAP systems, existing benchmarks are only focused on one or the other. With the rise of hybrid database systems, benchmarks to assess these systems will be needed as well. Based on the examination of existing benchmarks, a new benchmark for hybrid database systems is introduced in this book. It is furthermore used to determine the effect of adding OLAP to an OLTP workload and is applied to analyze the impact of typically used optimizations in the historically separate OLTP and OLAP domains in mixed-workload scenarios.

View more book Information

Download Table of Content

Authors: Dr. Anja Bog

Multi Tenancy for Cloud-Based In-Memory Column Databases

Workload Management and Data Placement

With the proliferation of Software-as-a-Service (SaaS) offerings, it is becoming increasingly important for individual SaaS providers to operate their services at a low cost. This book investigates SaaS from the perspective of the provider and shows how operational costs can be reduced by using “multi tenancy,” a technique for consolidating a large number of customers onto a small number of servers. Specifically, the book addresses multi tenancy on the database level, focusing on in-memory column databases, which are the backbone of many important new enterprise applications. For efficiently implementing multi tenancy in a farm of databases, two fundamental challenges must be addressed, (i) workload modeling and (ii) data placement. The first involves estimating the (shared) resource consumption for multi tenancy on a single in-memory database server. The second consists in assigning tenants to servers in a way that minimizes the number of required servers (and thus costs) based on the assumed workload model. This step also entails replicating tenants for performance and high availability. This book presents novel solutions to both problems.

View more book information

Authors: Dr. Jan Schaffner

Real-time Security Extensions for EPCglobal Networks

Case Study for the Pharmaceutical Industry

The transformation towards EPCglobal networks requires technical equipment for capturing event data and IT systems to store and exchange them with supply chain participants. For the very first time, supply chain participants thus need to face the automatic exchange of event data with business partners. Data protection of sensitive business secrets is therefore the major aspect that needs to be clarified before companies will start to adopt EPCglobal networks. This book contributes to this proposition as follows: it defines the design of transparent real-time security extensions for EPCglobal networks based on in-memory technology. For that, it defines authentication protocols for devices with low computational resources, such as passive RFID tags, and evaluates their applicability. Furthermore, it outlines all steps for implementing history-based access control for EPCglobal software components, which enables a continuous control of access based on the real-time analysis of the complete query history and a fine-grained filtering of event data. The applicability of these innovative data protection mechanisms is underlined by their exemplary integration in the FOSSTRAK architecture.

View more book Information

Download Table of Content

Authors: Dr. Matthieu Schapranow

A Real-Time In-Memory Discovery Service

Leveraging Hierarchical Packaging Information in a Unique Identifier Network to Retrieve Track and Trace Information

The research presented in this book discusses how to efficiently retrieve track and trace information for an item of interest that took a certain path through a complex network of manufacturers, wholesalers, retailers, and consumers. To this end, a super-ordinate system called "Discovery Service" is designed that has to handle large amounts of data, high insert-rates, and a high number of queries that are submitted to the discovery service. An example that is used throughout this book is the European pharmaceutical supply chain, which faces the challenge that more and more counterfeit medicinal products are being introduced. Between October and December 2008, more than 34 million fake drug pills were detected at customs control at the borders of the European Union. These fake drugs can put lives in danger as they were supposed to fight cancer, take effect as painkiller or antibiotics, among others. The concepts described in this book can be adopted for supply chain management use cases other than track and trace, such as recall, supply chain optimization, or supply chain analytics.

View more book information

Authors: Dr. Jürgen Müller