Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

HANA Load Simulator

Motivation

The HANA Load Simulator creates a realistic enterprise workload of thousands of concurrent users and executes that workload on different database configurations simultaneously. A dashboard monitors several performance indicators of each database, incl. data footprint, transaction latencies, throughput, and overall CPU utilization. The dashboard can also be used to configure several workload parameters like OLTP and OLAP query frequencies or the ratio of current and historical queries. This provides a simple and interactive tool to assess key performance characteristics of different database setups (e.g., single- vs. multi-node) side-by-side and in real-time.

Experiment

We compare a) a single database node with b) a multi-node setup consisting of a master node (current/actual data only), one replica node of the master for running OLAP transactions, and a cold node for historical data. Both setups have an equal total amount of cores and main memory. The usage of the replica node can be switched on and off. The workload consists of three types of transactions (ratio configurable): invoice postings (sFIN-adapted), read-only transactions incl. transactional queries (incl. BKPF-BSEG-joins), and OLAP transactions incl. read-heavy analytical queries.

With the partitioning into current and historical and replication of the current data, we see the following improvements (90% current-only OLAP transactions, 100% current-only OLTP transactions, one of 100 queries being analytical):

Improved performance

  • Transactional processing is improved even without the use of a replica due to the smaller data set. Activating the replica, the multi-node setup is faster by a factor of up to 4x for mixed workloads.

  • The higher the skew is towards an current-only workload, the more the new architecture outperforms the traditional setup.

  • When adding analytical users to the system, a replica of the current master node lowers the latency of OLTP transactions due to better load distribution.

Reduced costs

  • Historical data can be purged and better compressed to decrease the memory footprint and require less main memory than the traditional setup with all data being memory-resident.

  • Overall system costs potentially decrease as smaller servers for the historical nodes can be deployed, hence avoiding disproportional prices for large server systems.

Thousands of SAP customers in the fully occupied Orange County Convention Center in Orlando, FL and watchers of the live stream saw the impact of current / historical data optimization for SAP HANA in terms of database performance and system load.

Martin Boissier and Carsten Meyer had the chance to present the master project -HANA Load Simulator- (Daniel Kurzynski, Rui Ruhrlaender, Christopher Schmidt, Jannik Marten, Jan-Peer Rudolph, Alexander Franke, Jasper Schulz, and Pedro Flemming) live on stage during Prof. Hasso Plattner's keynote speech. 

 

Desirable: Seeing and comparing the impact of fundamental system changes, helps to understand the meaning of those changes and the true value behind them. Changing simulation parameter and getting direct feedback allows to explore system behavior and to consider the odds of various options.

Feasible: SAP HANA features read-only replication and current/historical data partitioning. Workload and data set can be generated and adapted close to a productive load and deployed to different hardware setups.

Viable: A technical description is less convincing than a running system. If people see positive effects of a system setup they are more willing to test it. If people can evaluate the impact of changes in their own environment, they are more willing to buy it.

Vision

The HANA Load Simulator shall visualize the impact of current/historical partitioning and read-only replication on a customer's productive system. Also the possibility of increased reliability (via high availability) using hot-standby replicas will be shown in future versions. With access to a customer’s production data and a corresponding workload trace, the simulator can mimic the real production system in order to show the feasibility and benefits of the mentioned concepts on different hardware setups.