Systems for Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are currently separated. The potential of latest technologies and changes in operational and analytical applications over the last decade give rise to the unification of these systems, which can be of benefit for both workloads. Research and industry have reacted and prototypes of hybrid database systems are appearing. Benchmarks are the standard method to evaluate, compare, and support the development of new database systems. Because of the separation of OLTP and OLAP systems, existing benchmarks are only focused on either one. With the unification of the two workloads and the rise of hybrid database systems, benchmarks to assess these systems are required as well. Based on an examination of existing benchmarks, the Composite Benchmark for Transactions and Reporting (CBTR) as a new benchmark for hybrid database systems is proposed in this thesis. CBTR is used to determine the effect of adding OLAP to an OLTP workload. Furthermore, it is applied to analyze the impact of typically used optimizations in the so far separate OLTP and OLAP domains in mixed workload scenarios. In contrast to established benchmarks, CBTR adds a parameter that controls how OLTP and OLAP workloads are mixed.
As companies are continuously adapting to market requirements their workloads are continuously changing. Benchmark workloads age respectively and have to be adjusted to stay realistic. CBTR's data set and the simulated workload are based on the data and workload of a current and widely used enterprise system as an up-to-date and realistic foundation and experiences from the established benchmarks. The queries of the OLTP and OLAP workload parts of CBTR access the same set of data enabling maximum query flexibility and up-to-date data for analytics while avoiding redundant data storage. For the tested database systems, the results of using CBTR to analyze the effect of adding OLAP to OLTP show that OLTP response times increase by factors between two and six when adding up to 100 concurrent OLAP clients to a base workload of 100 OLTP clients. Yet, with higher OLTP base workloads the increase of response times by adding OLAP clients becomes less pronounced.
In hybrid systems, the use of optimizations within the data set that is now used for both workloads has to be carefully considered. The logical database design optimizations so far used in OLTP or OLAP systems are largely in conflict, meaning that the usage of a specific optimization with a positive effect for an OLTP workload very likely has a negative impact on OLAP queries. Here, the results of using CBTR to assess the impact of selected optimizations reveals deviating behaviors for different database systems and workload mixes. Consequently, the analysis using CBTR represents a vital method to determine which optimizations are suitable for a given scenario and database system.
This thesis contributes in the areas of benchmarking and understanding logical database design decisions in mixed OLTP and OLAP workload scenarios. The contributions and adjacent works have been published in international conferences, workshops, and journals.
Existing benchmarks targeting relational databases in OLTP or OLAP environments are compared and discussed with regard to observations from current enterprise workloads. Enterprise workloads have evolved since the creation of the benchmarks widely in use today. Yet, the benchmark workloads remained the same since their specification with minor changes to requirements. The review of existing benchmarks and observations of current enterprise systems provide the foundation for the definition of CBTR as proposed in this thesis. CBTR is based on a real database design and it provides realistic queries to simulate different workloads. The scenario for CBTR, its conceptual entities for the definition of the data set and operations are completely specified and a set of analytical queries that can easily be extended is provided based on examinations of current OLAP systems. Workload mix has been introduced in CBTR as an additional parameter besides data set size and load (number of parallel users) to control the shares of the operational and analytical workload parts. The specific database schema used in CBTR is identical to what can be found in a widely used and current OLTP system. A prototypical implementation of a tool chain to run the benchmark is also part of this thesis. This tool chain provides the driver to run benchmark tests, supports the evaluation of benchmark results, and supplies a monitor to run benchmarks interactively to test different workload mixes and immediately observe the impact on query and transaction performance. Using this tool chain, the impact of mixing OLTP and OLAP can be quantified for any database system based on realistic data structures and workloads.
Logical Database Optimizations in Mixed Workloads
Data models used in the past and currently used in transaction and analytical processing are reviewed and discussed to create a deeper understanding of where the optimizations within today's OLTP and OLAP systems originated. On the base of this review of data models in transaction and analytical processing, logical database designs are analyzed. The focus lies on database schema optimizations employed in typical operational and analytical environments. This analysis contributes an overview of the key differentiators employed today that set apart OLTP and OLAP optimized database schemas. Based on the differentiators several variants of CBTR's database schema are defined that employ particular optimizations or avoid them.
The implemented benchmark driver as part of the previously mentioned tool chain is designed in such a modular way that only little effort is necessary to exchange the database schema and adapt the transactions and queries accordingly. The performance impact of different database schema variants is evaluated for selected database systems and workload mixes as part of this work. This evaluation serves as a validation of CBTR as assumptions about the results are met and can be explained with the aid of implementation knowledge of the databases. The database schema variants have been analyzed regarding their potential for performance improvements under differing mixed OLTP and OLAP workloads. The results provide valuable insights into the behavior of existing database systems under mixed workload conditions and show that the variation of database schemas is an important factor that has to be considered when changing the workload. The results, furthermore, underline the expectation that different database types (disk-based and row-oriented vs. disk-based and column-oriented vs. in-memory and column-oriented) react differently and, thus, a method to test and configure a given database in line with the workload it is supposed to handle is highly relevant.