Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
Login
 

Johannes Wust

Mixed Workload Management for In-Memory Databases — Executing Mixed Workloads of Enterprise Applications with TAMEX

In-memory database management systems (IMDBMS) that leverage a column- oriented storage layout have been proposed as an integrated database approach for transactional and analytical enterprise applications. Compared to disk-based databases, this architecture reduces the execution time of complex analytical queries on transactional data by orders of magnitude to merely seconds. The two main reasons for this dramatic performance increase are massive intra-query parallelism on multicore CPUs and primary data storage in main memory. The benefits of adopting these IMDBMS for companies are huge, as they provide ways to simplify the data management layer, as well as open the way for new applications based on real-time analytics on the latest transactional data. However, this very much desirable approach of processing a mixed workload of transactional queries, classical batch-oriented analytics, as well as real-time analytics on a single database system introduces the problem of competition for database resources. In the worst case, processing queries of less response-time critical reporting applications could block the processing of business-critical queries from transactional applications. To execute mixed workloads on a single database, workload management features have been implemented for established disk-based databases. Most of the approaches are based on processor sharing, but this is prohibitively expensive for IMDBMS, as IMDBMS try to keep the number of running threads close to the number of available processor cores. Consequently, this dissertation proposes a solution for workload management for IMDBMS that allows prioritizing certain query classes and enforcing defined resource shares for workloads, while executing queries efficiently and in parallel on multiprocessor architectures.

As a first step towards designing a workload manager for IMDBMS, this dissertation defends the thesis that a task-based query execution model is best suited for processing a mixed workload and enforcing workload management objectives. The general idea of a task-based execution model is to partition complex queries into smaller, non-preemptive units of work — so-called tasks — and map these tasks dynamically to a pool of worker threads by a user-level scheduler. Workload management objectives are enforced by scheduling tasks based on static and dynamically calculated priorities. As a next step, we have implemented TAMEX, a TAsk-based framework for Multiple query class EXecution to demonstrate the effectiveness of the proposed concept. In an extensive evaluation, we demonstrate that TAMEX can efficiently prioritize the execution of specific workloads, while minimizing the performance impact of less response-time critical queries on high priority queries. Furthermore, we demonstrate that TAMEX can enforce defined resource shares for workloads based on dynamic task priorities. The evaluation of TAMEX is based on a set of workload management objectives and a set of mixed workload queries which are derived from an analysis of workload characteristics of enterprise applications.