Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. h.c. Hasso Plattner
  
 

Bachelor’s Project – In-Memory Data Management

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

This bachelor’s project was conducted from October 2008 to July 2009 at our research group in cooperation with SAP AG. Starting with analyzing the characteristics of OLTP systems the project focused on how an in-memory data layer can change how “traditional” OLTP applications work. By eliminating redundantly stored aggregates, views, and summary data the applications became 10-100 times faster while improving change management and extensibility. Furthermore, the in-memory data management enables analytics and search on the same set of data. Consequently, a set of totally new applications becomes feasible with a reduced total cost of ownership as a side effect. These applications will enable businesses to become clear enterprises by getting real-time insights, making more confident decisions, anticipating and reacting to changing conditions, and becoming more flexible while being more effective by taking better advantage of the data that is already there. We applied the concept of in-memory data management to build an improved Financial Accounting. Connecting SAP’s in-memory column store TREX to an SAP ERP system while modifying the applications towards the changed programming paradigm could show the impact of such a technology.

Motivation

Because enterprise applications are built on 20-year old data management infrastructure, which was designed to meet requirements for OLTP systems, the goal of the project was to leverage the advantages of in-memory column-oriented data management for enterprise applications. Nowadays, enterprise applications have become more sophisticated, data set sizes have increased, requirements on the freshness of input data have increased, and the time allotted for completing business processes has been reduced. Enterprise applications have become increasingly complicated to make up for shortcomings in the data management infrastructure including separate stores for transactional and analytical data. In fact, enterprise applications just use a small subset of current database features, especially for transactional behavior and session handling. Storing data redundantly by predefined characteristics is used to solve performance issues while accepting both an increase of complexity and a decrease of flexibility at the same time. Enterprise applications today are primarily focused on the day-to-day transaction processing needed to run the business while the analytical processing necessary to understand and manage the business is added on thereafter. At the same time, hardware has evolved; for example, the size of main memory has been significantly increased and multi-core CPU’s allow massive parallelization. Column databases are a new data management technology in which data is organized along columns, in contrast to conventional RDBMSs where it is organized along rows. Column databases are particularly attractive for analytical queries, which often require subsets of relations and aggregations across columns. Memory-based column databases, such as SAP's Business Warehouse Accelerator (TREX), can rapidly execute analytics-style queries. In contrast to disk-based approaches, the in-memory computing capabilities allow rapid access to single business entities as well, even though sequential scanning is still faster. Due to compression, memory-based column databases can even manage data set sizes of large businesses on a handful of blade servers by distributing data across multiple servers using a shared-nothing approach.

Enterprise Application-specific Data Management

Furthermore, part of this project has been a requirements engineering for enterprise data management, which has been derived from data characteristics and usage patterns of enterprise applications in realistic customer environments. By using both the identified enterprise application-specific characteristics and the in-memory column store SAP TREX, a prototype has been built to enable complex OLTP operations and real-time analytics by modifying an SAP ERP system. The following database techniques have been taken into account and leveraged in the proof-of-concept prototype:

  • In-memory data management
  • Column-wise data representation
  • Lightweight data compression
  • Technical insert only approach
  • Enterprise application-specific compression process (Merge)
  • Semantic horizontal partitioning for parallelization
  • Capability of on-the-fly aggregation and full-table scans

Modifications, which have been implemented on the SAP ERP system:

  • Redundancy-free data schema

    • All aggregates are computed on-the-fly
    • Materialized views for specific ledgers are removed and generated online
    • Results of background jobs are not stored but created directly
    • Simpler schema allows the elimination of inserts and updates (no locking on aggregates needed)

  • Insert-only Approach

    • Simplifies transaction handling (Snapshot isolation and Application-level locking)
    • History of changes is managed by the data management layer
    • Reporting on historical data is possible even on OLTP data

  • Stored procedures for elementary business functionality, for example:

    • Compute balance of an account
    • Determine open or overdue items in Financial Accounting
    • Currency conversion

  • Integrated Analytics

    • OLAP queries are executed on the OLTP system
    • Used to improve transactional processes, for example: customer segmentation query

The project pointed out the concept of enterprise application specific data management optimized for unified OLTP and OLAP workloads, which is much simpler than standard databases and has the potential to provide a performance boost to enterprise applications, like column store databases did for OLAP systems, by using the derived application-specific characteristics. One example for the greatly improved performance was the dunning run of our real customer data: In the old system the dunning run took 20 minutes to complete, while in our modified system it was possible to execute the dunning run in near real-time --- 1.5 seconds.

More

Publications:

  • Hasso Plattner: A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, Proceedings of the 35th SIGMOD International Conference on Management of Data, Providence, Rhode Island, 2009 - Download, BibTeX, Presentation
  • Martin Grund, Jens Krueger, Christian Tinnefeld, Alexander Zeier: Vertical Partition for Insert-Only Scenarios in Enterprise Applications, IEEE 16th International Conference on Industrial Engineering Management (IE&EM'09), Beijing, China, 2009 - Download, BibTeX

Supervisors: Prof. Dr. Hasso Plattner, Dr. Jens Krueger, Dr. Martin Grund

Related Research Area: In-Memory Enterprise Data Management

Project Period: 10/2008 - 07/2009