Richly, K., Schlosser, R., Boissier, M.: Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications. Proceedings of the VLDB Endowment. pp. 4079–4092 (2022).
Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.
Boissier, M.: Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems. Proceedings of the VLDB Endowment. pp. 780–793 (2022).
Data encoding has been applied to database systems for decades as it mitigates bandwidth bottlenecks and reduces storage requirements. But even in the presence of these advantages, most in-memory database systems use data encoding only conservatively as the negative impact on runtime performance can be severe. Real-world systems with large parts being infrequently accessed and cost efficiency constraints in cloud environments require solutions that automatically and efficiently select encoding techniques, including heavy-weight compression. In this paper, we introduce workload-driven approaches to automatically determine memory budget-constrained encoding configurations using greedy heuristics and linear programming. We show for TPC-H, TPC-DS, and the Join Order Benchmark that optimized encoding configurations can reduce the main memory footprint significantly without a loss in runtime performance over state-of-the-art dictionary encoding. To yield robust selections, we extend the linear programming-based approach to incorporate query runtime constraints and mitigate unexpected performance regressions.
Heinzl, L., Hurdelhey, B., Boissier, M., Perscheid, M., Plattner, H.: Evaluating Lightweight Integer Compression Algorithms in Column-Oriented In-Memory DBMS. 12th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2021, Copenhagen, Denmark, August 16, 2021 (2021).
Lightweight data compression algorithms are often used to decrease memory consumption of in-memory databases. In recent years, various integer compression techniques have been proposed that focus on sequential encoding and decoding and exploit modern CPUs' vectorization capabilities. Interestingly, another dominant access pattern in databases systems has seen little attention: random access decoding. In this paper, we compare end-to-end database performance for various integer compression codecs on three recent CPU architectures. Our evaluation suggests that random access performance is often more relevant than vectorization capabilities for sequential accesses. Before integrating selected encodings in the database core, we benchmarked seven libraries in an exhaustive standalone comparison. We integrated the most promising techniques into the relational in-memory database system Hyrise and evaluated their performance for TPC-H, TPC-DS, and the Join Order Benchmark on three different CPU architectures. Our results emphasize the importance of random access decoding. Compared to state-of-the-art dictionary encoding in TPC-H, alternatives allow reducing memory consumption of integer columns by up to 53 % while improving runtime performance by 5 % on an Intel CPU and over 16 % on an Apple M1.
Klauck, S., Plauth, M., Knebel, S., Strobl, M., Santry, D., Eggert, L.: Eliminating the Bandwidth Bottleneck of Central Query Dispatching Through TCP Connection Hand-Over. Datenbanksysteme für Business, Technologie und Web (BTW), 18. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS). pp. 97–106 (2019).
Dreseler, M., Kossmann, J., Boissier, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. 22nd International Conference on Extending Database Technology (EDBT). pp. 313–324 (2019).
Hiller, J., Kimmerlin, M., Plauth, M., Heikkila, S., Klauck, S., Lindfors, V., Eberhardt, F., Bursztynowski, D., Santos, J.L., Hohlfeld, O., Wehrle, K.: Giving Customers Control over Their Data: Integrating a Policy Language into the Cloud. 2018 IEEE International Conference on Cloud Engineering (IC2E) (2018).
Cloud computing offers the potential to store, manage, and process data in highly available, scalable, and elastic environments. Yet, these environments still provide very limited and inflexible means for customers to control their data. For example, customers can neither specify security of inter-cloud communication bearing the risk of information leakage, nor comply with laws requiring data to be kept in the originating jurisdiction, nor control sharing of data with third parties on a fine-granular basis. This lack of control can hinder cloud adoption for data that falls under regulations. In this paper, we show in six use cases how cloud environments can be enriched with policy language support to give customers control over cloud data. Our use cases are based on realizing policy language support in all three cloud environment layers, i.e., IaaS, PaaS, and SaaS. Specifically, we present policy-aware resource management (with OpenStack) and dynamic network configuration. With CERN's big data storage and the in-memory database Hyrise, we show realization for storage and further exemplify policy-aware cloud processing by network function virtualization which enables Orange to offload customer home gateways to the cloud. Finally, we discuss benefits of policy support in F-Secure's Security Cloud. These use cases show the feasibility of realizing customer control with policy support in the cloud. Thus, our work enables customers with regulated data to tap cloud benefits and significantly broadens the market for cloud providers.
Klauck, S.: Scalability, Availability, and Elasticity through Database Replication in Hyrise-R. Proceedings of the 4th HPI Cloud Symposium “Operating the Cloud” 2016. pp. 1–10 (2017).
The growing analytical demand increases the importance of scalability and elasticity for mixed workload in-memory databases. Data replication is a way to cope with the growing demand and entails increased availability. In this paper, we describe different replication mechanisms, balancing query performance and availability. In addition, we outline how we implemented the cloud-ilities scalability, availability, and elasticity in Hyrise-R, a replication extension of the in-memory database Hyrise. Finally, we summarize further current research activities within the Hyrise project, i. e., data tiering, self-adaption and non-volatile RAM.
Lindemann, J., Klauck, S., Schwalb, D.: A Scalable Query Dispatcher for Hyrise-R. Proceedings of the 3rd HPI Cloud Symposium “Operating the Cloud” 2015. pp. 25–32 (2016).
While single machines can handle the transactional database workload of most companies, the increasing analytical load will push them to their limit. For this reason, we extended the open source in-memory database Hyrise with the capability to form a database cluster for scalability and increased availability. This scale out and hot standby version is called Hyrise-R. It implements lazy master replication and has been shown to be well suited for mixed workloads as they exist in enterprise applications. In this paper we present our extension of Hyrise-R: a query dispatcher, which works fully transparently and implements an enhanced query distribution algorithm. The new distribution algorithm improves load balancing and prioritizes write requests for higher transaction throughput. In addition, we discuss our work in progress and planned activities for Hyrise-R.
Klauck, S., Butzmann, L., Müller, S., Faust, M., Schwalb, D., Uflacker, M., Sinzig, W., Plattner, H.: Interactive, Flexible, and Generic What-If Analyses Using In-Memory Column Stores. Database Systems for Advanced Applications. pp. 488–497 (2015).
One well established method of measuring the success of companies are key performance indicators, whose inter-dependencies can be represented by mathematical models, such as value driver trees. While such models have commonly agreed semantics, they lack the right tool support for business simulations, because a flexible implementation that supports multi-dimensional and hierarchical structures on large data sets is complex and computationally challenging. However, in-memory column stores as the backbone of enterprise applications provide incredible performance that enables to calculate flexible simulation scenarios interactively even on large sets of enterprise data.
Butzmann, L., Klauck, S., Mueller, S., Uflacker, M., Plattner, H., Sinzig, W.: Generic Business Simulation Using an In-Memory Column Store. Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS). pp. 633–643 (2015).
Value driver trees are a well-known methodology to model dependencies such as the definition of key performance indicators. While the models have well-known semantics, they lack the right tool support for business simulations, because a flexible implementation that supports multidimensional, hierarchical value driver trees and data bindings is very complex and computationally challenging. This paper tackles this problem by proposing an approach for generic enterprise simulations which are based on value driver trees. Our approach is two-fold: we present the definition of a simulation meta model at design time, and the run-time simulation tool. The simulation meta model describes the structure of the dependency graph, the data binding, and the parametrization of the model to simulate data changes. The simulation tool can then be used to create and edit simulation model instances and run simulations in real-time by leveraging an in-memory column store. Besides the formal description of the approach, this work presents a prototypical implementation of the simulation tool and an evaluation using data of a consumer packaged goods company.
Plattner, H., Mueller, S., Nica, A., Butzmann, L., Klauck, S.: Using Object-Awareness to Optimize Join Processing in the SAP HANA Aggregate Cache. Proceedings of the 18th International Conference on Extending Database Technology (EDBT), Brussels, Belgium (2015).
Schwalb, D., Kossmann, J., Faust, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise-R: Scale-out and Hot-Standby through Lazy Master Replication for Enterprise Applications. Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics (IMDM), in conjunction with VLDB 2015 Kohala Coast, Hawaii (2015).
In-memory database systems are well-suited for enterprise workloads, consisting of transactional and analytical queries. A growing number of users and an increasing demand for enterprise applications can saturate or even overload single- node database systems at peak times. Better performance can be achieved by improving a single machine’s hardware but it is often cheaper and more practicable to follow a scale-out approach and replicate data by using additional machines. In this paper we present Hyrise-R, a lazy master replication system for the in-memory database Hyrise. By setting up a snapshot-based Hyrise cluster, we increase both performance by distributing queries over multiple instances and availability by utilizing the redundancy of the cluster structure. This paper describes the architecture of Hyrise- R and details of the implemented replication mechanisms. We set up Hyrise-R on instances of Amazon’s Elastic Compute Cloud and present a detailed performance evaluation of our system, including a linear query throughput increase for enterprise workloads.
Mueller, S., Butzmann, L., Klauck, S., Plattner, H.: An Adaptive Aggregate Maintenance Approach for Mixed Workloads in Columnar In-Memory Databases. Proceedings of the Thirty-Seventh Australasian Computer Science Conference (ACSC ’14) - Volume 147. pp. 3–12 (2014).
Mueller, S., Butzmann, L., Klauck, S., Plattner, H.: Materialized View Maintenance Leveraging In-Memory Data Structures. International Journal On Advances in Software, vol. 7, no. 3&4. (2014).
Mueller, S., Butzmann, L., Klauck, S., Plattner, H.: Workload-Aware Aggregate Maintenance in Columnar In-Memory Databases. IEEE International Conference on Big Data (IEEE Big Data 2013), Silicon Valley, USA (2013).
Mueller, S., Butzmann, L., Höwelmeyer, K., Klauck, S., Plattner, H.: Efficient View Maintenance for Enterprise Applications in Columnar In-Memory Databases. 17th IEEE International Enterprise Distributed Object Computing Conference (EDOC), Vancouver, Canada (2013).
Zeier, A., Plattner, H., Butzmann, L., Klauck, S., Tinnefeld, C., Mueller, S.: Available-To-Promise on an In-Memory Column Store. Datenbanksysteme in Business, Technologie und Web (BTW 2011), 14. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings, Kaiserslautern, Germany (2011).
Available-To-Promise (ATP) is an application in the context of Supply Chain Management (SCM) systems and provides a checking mechanism that calculates if the desired products of a customer order can be delivered on the requested date. Modern SCM systems store relevant data records as aggregated numbers which implies the disadvantages of maintaining redundant data as well as inflexibility in querying the data. Our approach omits aggregates by storing all individual data records in an in-memory, column-store and scans through all relevant records on-the-fly for each check. We contribute by describing the novel data organization and a lockingfree, highly-concurrent ATP checking algorithm. Additionally, we explain how new business functionality such as instant rescheduling of orders can be realized with our approach. All concepts are implemented within a prototype and benchmarked by using an anonymized SCM dataset of a Fortune 500 consumer products company. The paper closes with a discussion of the results and gives an outlook how this approach can help companies to find the right balance between low inventory costs and high order fulfillment rates.