Hasso-Plattner-Institut
Prof. Dr. h.c. Hasso Plattner
  
 

HYRISE - The Open-Source In-Memory Research DBMS

General Information

Hyrise is the research in-memory database system that has been developed by our group since 2009 and has been entirely rewritten in 2017. Our goal is to provide a clean and flexible platform for research in the area of in-memory data management. Its architecture allows us, our students, and other researchers to conduct experiments around new data management concepts. To enable realistic experiments, Hyrise features comprehensive SQL support and performs powerful query plan optimizations. Well-known benchmarks, such as TPC-H or TPC-DS, can be executed with a single command and without any preparation.

Above the DBMS foundation of Hyrise, we build the “autonomous database”. The vision is to support database administrators (DBAs) in handling the growing complexity of not only the database systems themselves, but also that of the stored data and the workload. For this, Hyrise monitors runtime parameters, predicts the impact of possible configuration changes, and automatically applies those changes that are deemed beneficial. These changes may include the creation of secondary indexes, the (re-)encoding of columns, or the eviction of unused data to lower memory and storage tiers.

To foster reuse and reproduction, Hyrise is completely open source and available on Github. We value high quality code (C++20), as documented by strict code reviews, a test coverage of 90%, various linting and static code analysics tools, and a comment-to-code ratio of 1:4.

You are invited to read our Step-by-Step Guide and to contact us with any questions that may arise.

The project team consists of Markus Dreseler, Jan KossmannMartin Boissier, Stefan Halfpap, Keven Richly, Dr. Michael Perscheid, and Prof. Dr. h.c. Hasso Plattner. We thank all student contributors, without whom this work would not have been possible.

Architecture

Hyrise consists of two parts. Firstly, the DBMS Foundation comprises the components that are necessary to store data and execute queries. Secondly, the Autonomous Databes, which will be described below, is responsible for automatically tuning the system. The architecture diagram above visualizes these two parts.

Users can interact with Hyrise using one of three interfaces: First, the CLI Console offers features beyond those traditionally known from command line clients. These include the inline visualization of query plans in the form of annotated graphs. Second, Hyrise supports the PostgreSQL wire protocol and can thus be accessed using the psql client or compatible libraries. Finally, the benchmark binaries are a one-stop solution for executing different benchmarks and obtaining human- and machine-readable benchmark results.

Independent of the used interface, SQL queries enter the SQL Pipeline, which transforms the query string into a logical query plan, which is then optimized, transformed to a physical plan, and finally executed. We discussed the different optimization steps and quantified their impact here.

Hyrise stores table data in so-called chunks. A chunk is a fine-granular, horizontal partition of the table with a predefined number of rows. New rows are inserted into the last chunk of the table. Once this chunk reaches its target size, it is marked as immutable and a new mutable chunk is appended. Chunks are used as a flexible basis for indexes, filters, and statistics. Internally, chunks hold one segment per column of the table. This makes Hyrise a primarily column-oriented DBMS. Segments that are part of an immutable chunk may asynchronously be encoded (aka. compressed) using one of several encoding schemes. By default, dictionary encoding is used.

Autonomous Database

The DBMS Foundation is the basis for our autonomous database. We support a number tuning options that can be used to optimize the system’s performance. Among them is the automatic selection of encoding mechanisms, the data-driven partitioning, and the automatic migration of data between tiers. Many of these are developed as parts of individual research projects. As such, they are subject to frequent changes. To facilitate the independent development of these tuning options, we have decoupled them from the Hyrise core and implement them in the form of plugins.

At the same time, many tuning options have shared requirements. For an efficient selection of encoding mechanisms, where less frequently accessed segments are compressed more heavily, the number of accesses to these segments has to be tracked. The same information is needed by the automatic tiering plugin. Overlaps between plugins cannot only be found in their input data, but also in internal mechanisms. For example, the mentioned plugins both aim at balancing two competing goals, i.e., reducing the DRAM footprint without negatively affecting the query throughput. In the long run, we plan for these shared requirements to be fulfilled by the driver in the Hyrise core.

The driver takes input parameters from runtime KPIs (e.g., the system utilization, the number of accesses to individual segments, and more), the constraints defined by the DBA, and the options provided by the different plugins. Based on these parameters, it makes decisions in a centralized manner. These decisions can then be realized by the different plugins.

Additional Resources

Besides our publications (see below), we are also documenting our progress with Hyrise in the Hyrise Wiki, on our Medium Blog and on the Hyrise Twitter channel.

Selected Publications

  • Kossmann, J., Halfpap, S., Jankrift, M., Schlosser, R.: Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms.Proceedings of the VLDB Endowment. bll. 2382-2395 (2020).
     
  • Kossmann, J., Schlosser, R.: Self-driving database systems: a conceptual approach.Distributed and Parallel Databases.38,795-817 (2020).
     
  • Dreseler, M., Boissier, M., Rabl, T., Uflacker, M.: Quantifying TPC-H Choke Points and Their Optimizations.Proceedings of the VLDB Endowment. bll. 1206-1220 (2020).
     
  • Schlosser, R., Halfpap, S.: A Decomposition Approach for Risk-Averse Index Selection.32nd International Conference on Scientific and Statistical Database Management (SSDBM 2020). bll. 16:1-16:4 (2020).
     
  • Dreseler, M.: Storing STL Containers on NVM.Persistent Programming in Real Life (2019).
     
  • Boissier, M., Jendruk, M.: Workload-Driven and Robust Selection of Compression Schemes for Column Stores.22nd International Conference on Extending Database Technology (EDBT). bll. 674-677 (2019).
     
  • Dreseler, M., Kossmann, J., Boissier, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management.22nd International Conference on Extending Database Technology (EDBT). bll. 313-324 (2019).
     
  • Kossmann, J., Dreseler, M., Gasda, T., Uflacker, M., Plattner, H.: Visual Evaluation of SQL Plan Cache Algorithms.Australasian Database Conference (ADC) (2018).
     
  • Dreseler, M., Gasda, T., Kossmann, J., Uflacker, M., Plattner, H.: Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads.Australasian Database Conference (ADC) (2018).
     
  • Dreseler, M., Kossmann, J., Frohnhofen, J., Uflacker, M., Plattner, H.: Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans.Joint Workshop of HardBD (International Workshop on Big Data Management on Emerging Hardware) and Active (Workshop on Data Management on Virtualized Active Systems), in conjunction with ICDE (2018).
     
  • Schmidt, C., Dreseler, M., Akin, B., Roy, A.: A Case for Hardware-Supported Sub-Cache Line Accesses.Data Management on New Hardware (DaMoN), in conjunction with SIGMOD (2018).
     
  • Schwalb, D., Bk, G.K., Dreseler, M., S, A., Faust, M., Hohl, A., Berning, T., Makkar, G., Plattner, H., Deshmukh, P.: Hyrise-NV: Instant Recovery for In-Memory Databases using Non-Volatile Memory.International Conference on Database Systems for Advanced Applications (DASFAA) (2016).
     
  • Schwalb, D., Dreseler, M., Uflacker, M., Plattner, H.: NVC-Hashmap: A Persistent and Concurrent Hashmap For Non-Volatile Memories.In-Memory Data Management Workshop (IMDM), in conjunction with VLDB (2015).
     
  • Schwalb, D., Kossmann, J., Faust, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise-R: Scale-out and Hot-Standby through Lazy Master Replication for Enterprise Applications.Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics (IMDM), in conjunction with VLDB 2015 Kohala Coast, Hawaii (2015).
     
  • Faust, M., Schwalb, D., Plattner, H.: Composite Group-Keys: Space-efficient Indexing of Multiple Columns for Compressed In-Memory Column Stores.IMDM in conjunction with VLDB (2014).
     
  • Schwalb, D., Faust, M., Wust, J., Grund, M., Plattner, H.: Efficient Transaction Processing for Hyrise in Mixed Workload Environments.IMDM in conjunction with VLDB (2014).
     
  • Grund, M., Cudre-Mauroux, P., Krüger, J., Madden, S., Plattner, H.: An overview of HYRISE - a Main Memory Hybrid Storage Engine.IEEE Data Engineering Bulletin. (2012).
     
  • Faust, M., Krüger, J., Schwalb, D., Plattner, H.: Fast Lookups for In-Memory Column Stores: Group-Key Indices, Lookup and Maintenance.ADMS (in conjunction with VLDB) (2012).
     
  • Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudre-Mauroux, P., Madden, S.: HYRISE - A Hybrid Main Memory Storage Engine.Proceedings of the VLDB Endowment Volume 4 Issue 2. bll. 105-116 (2011).
     
  • Grund, M., Cudre-Mauroux, P., Madden, S.: A Demonstration of HYRISE- A Main Memory Hybrid Storage Engine.VLDB (2011).