Hyrise: The Open-Source In-Memory Research DBMS

General Information

Hyrise is the research in-memory database system that has been developed at HPI since 2009 and has been entirely rewritten since 2016. Our goal is to provide a clean and flexible platform for research in the area of in-memory data management. Its architecture allows us, our students, and other researchers to conduct experiments around new data management concepts. To enable realistic experiments, Hyrise features comprehensive SQL support and performs powerful query plan optimizations. Well-known benchmarks, such as TPC-H or the Join Order Benchmark, can be executed with a single command and without any preparation.

To foster reuse and reproduction, Hyrise is completely open source, written in C++, and available on Github.

The current project team consists of Martin Boissier, Daniel Lindner, and Marcel Weisgut. We thank all student contributors, without whom this work would not have been possible.

Database Architecture

Hyrise consists of two parts. Firstly, the DBMS Foundation comprises the components that are necessary to store data and execute queries. Secondly, the Autonomous Database, which will be described below, is responsible for automatically tuning the system. The architecture diagram above visualizes these two parts.

Users can interact with Hyrise using one of three interfaces: First, the CLI Console offers features beyond those traditionally known from command line clients. These include the inline visualization of query plans in the form of annotated graphs. Second, Hyrise supports the PostgreSQL wire protocol and can thus be accessed using the psql client or compatible libraries. Finally, the benchmark binaries are a one-stop solution for executing different benchmarks and obtaining human- and machine-readable benchmark results.

Independent of the used interface, SQL queries enter the SQL Pipeline, which transforms the query string into a logical query plan, which is then optimized, transformed to a physical plan, and finally executed. We discussed the different optimization steps and quantified their impact here.

Hyrise stores table data in so-called chunks. A chunk is a fine-granular, horizontal partition of the table with a predefined number of rows. New rows are inserted into the last chunk of the table. Once this chunk reaches its target size, it is marked as immutable and a new mutable chunk is appended. Chunks are used as a flexible basis for indexes, filters, and statistics. Internally, chunks hold one segment per column of the table. This makes Hyrise a primarily column-oriented DBMS. Segments that are part of an immutable chunk may asynchronously be encoded (aka. compressed) using one of several encoding schemes. By default, dictionary encoding is used.

Research Activities

Data Compression & Tiering

Data compression and tiering are powerful methods to address the memory bottleneck and cost inefficiencies for in-memory databases. The automatic decision on which data compression technique to use in in-memory column stores is challenging due to trade-offs and non-obvious impacts on large workloads. we propose a solution for an automatic selection of a budget-constraint encoding in Hyrise, based on linear programming (LP) and greedy heuristics. The encoding configurations are robust with respect to runtime performance, adaptable and workload-aware. To ensure performance robustness, LP techniques are applied to achieve equally distributed performance gains over all queries. The results show the potential of significant memory budget reductions without a deterioration of runtime performance.

Similarly, data tiering promises to reduce the amount of data in main memory by moving infrequently used data to cheaper and more elastic lower memory and secondary storage tiers. The challenge is to find an optimal balance for the trade-off between performance and costs. We propose an automatic tiering for Hyrise, using LP, that addresses this challenge. Our approach tracks frequency and pattern of data accesses to identify rarely used data, which are moved to secondary memory tiers (e. g., NVM / SSDs). This method is applicable to column selection problems in general and ensures Pareto-efficiency for varying memory budgets. Since, aspects like selectivity, size and frequency of queries are taken into account, the resulting performance is optimized and outperforms other heuristics.

_{• Boissier, M., Weisgut, M., Rabl, T.:}_{Compression in Main Memory Database Systems: Cost and Performance Trade-Offs of Workload-Driven Data Encoding}_._{Datenbanksysteme für Business, Technologie und Web (BTW)}_{. pp. 779–786 (2025). [}_DOI_{] [}_Download_]
_{• Richly, K., Schlosser, R., Boissier, M.:}_{Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications}_._{Proceedings of the VLDB Endowment 15(13)}_{. pp. 4079–4092 (2022). [}_DOI_{] [}_Download_]
_{• Boissier, M.:}_{Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems}_._{Proceedings of the VLDB Endowment 15(4)}_{. pp. 780–793 (2021). [}_DOI_{] [}_Download_]

Modern Hardware

To exploit recent hardware developments and increase efficiency, we are evaluating how modern hardware can be best used in modern database systems, with Hyrise serving as a research platform. One focus is memory heterogeneity and disaggregation. We investigate workload-aware data placement across local and remote memory tiers, enabled by fast interconnects such as CXL. As remote memory introduces higher access latencies, we also study approaches to hide these latencies, such as software prefetching. Another focus concerns accelerating classical database operators using modern CPU vector capabilities. We exploit wide vector registers to parallelize core operations, such as the sort-merge join.

_{• Weisgut, M., Ritter, D., Tözün, P., Benson, L., Rabl, T.:}_{CXL Memory Performance for In-Memory Data Processing}_._{Proceedings of the VLDB Endowment. pp. 3119–3133 (2025}_{). [}_DOI_{] [}_Download_]
_{• Riekenbrauck, N., Weisgut, M., Lindner, D., Rabl, T.:}_{A Three-Tier Buffer Manager Integrating CXL Device Memory for Database Systems}_._{Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems @ ICDE 2024}₍₂₀₂₄₎

Data Dependencies

Efficient query optimization is usually based on metadata, such as cardinalities and other basic statistics. More advanced techniques consider data dependency types, such as functional, uniqueness, order, or inclusion constraints / dependencies. We identified 60 query optimization techniques for application areas like join, selection, sorting and set operations in the literature that are based on data dependencies.

Toward an efficient implementation and integration into commercial database systems, we laid out a vision for a workload-driven discovery system for query optimization. The dependency discovery is considered “lazy” since only those data dependency candidates are considered that are relevant for the observed workload. Our prototypical implementation in Hyrise identifies relevant data dependency candidates based on executed query plans and dynamically validates the candidates against the database, leading to performance improvements.

_{• Lindner, D., Ritter, D., Naumann, F.:}_{Unleashing Data Dependency-based Query Optimization}_._{International Conference on Extending Database Technology, EDBT}_{. pp. 516–529 (2026). [}_DOI_{] [}_Download_]

Publications

Publications – Database Group

Unleashing Data Dependency-based Query Optimization
Lindner, D., Ritter, D., Naumann, F.
International Conference on Extending Database Technology, EDBT. pp. 516–529 (2026)
[ DOI ] [ Download ]
CXL Memory Performance for In-Memory Data Processing
Weisgut, M., Ritter, D., Tözün, P., Benson, L., Rabl, T.
Proceedings of the VLDB Endowment 18(9). pp. 3119–3133 (2025)
[ DOI ] [ Download ]
Compression in Main Memory Database Systems: Cost and Performance Trade-Offs of Workload-Driven Data Encoding
Boissier, M., Weisgut, M., Rabl, T.
Datenbanksysteme für Business, Technologie und Web (BTW). pp. 779–786 (2025)
[ DOI ] [ Download ]
A Case for Ecological Efficiency in Database Server Lifecycles
Bodner, T., Boissier, M., Rabl, T., Salazar-Díaz, R., Schmeller, F., Strassenburg, N., Tolovski, I., Weisgut, M. and Yue, W.
Conference on Innovative Data Systems Research, CIDR (2025)
[ URL ] [ Download ]
A Three-Tier Buffer Manager Integrating CXL Device Memory for Database Systems
Riekenbrauck, N., Weisgut, M., Lindner, D., Rabl, T.
Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems, ICDE (2024)
[ DOI ] [ Download ]
Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications
Richly, K., Schlosser, R., Boissier, M.
Proceedings of the VLDB Endowment 15(13). pp. 4079–4092 (2022)
[ DOI ] [ Download ]
Separated Allocator Metadata in Disaggregated In-Memory Databases: Friend or Foe?
Weisgut, M., Ritter, D., Boissier, M., Perscheid, M.
1st Workshop on Composable Systems, COMPSYS@IPDPS, awarded as best paper (2022)
[ Download ]
Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization
Kossmann, J., Lindner, D., Naumann, F., Papenbrock, T.
Proceedings of the Conference on Innovative Data Systems Research, CIDR (2022)
[ Download ]
Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems
Boissier, M.
Proceedings of the VLDB Endowment 15(4). pp. 780–793 (2021)
[ DOI ] [ Download ]
Data dependencies for query optimization: a survey
Kossmann, J., Papenbrock, T., Naumann, F.
The VLDB Journal. (2021)
[ DOI ] [ Download ]
Learned What-If Cost Models for Autonomous Clustering
Lindner, D., Loeser, A., Kossmann, J.
New Trends in Database and Information Systems - ADBIS 2021. pp. 3–13 (2021)
[ Download ]
A Cockpit for the Development and Evaluation of Autonomous Database Systems
Kossmann, J., Boissier, M., Dubrawski, A., Heseding, F., Mandel, C., Pigorsch, U., Schneider, M., Schniese, T., Sobhani, M., Tsayun, P., Wille, K., Perscheid, M., Uflacker, M., Plattner, H.
IEEE International Conference on Data Engineering, ICDE. pp. 2685–2688 (2021)
[ Download ]
Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms
Kossmann, J., Halfpap, S., Jankrift, M., Schlosser, R.
Proceedings of the VLDB Endowment 13(11). pp. 2382–2395 (2020)
[ Download ]
Self-driving database systems: a conceptual approach
Kossmann, J., Schlosser, R.
Distributed and Parallel Databases. 38 (4), 795–817 (2020)
[ Download ]
Quantifying TPC-H Choke Points and Their Optimizations
Dreseler, M., Boissier, M., Rabl, T., Uflacker, M.
Proceedings of the VLDB Endowment 13(8). pp. 1206–1220 (2020)
[ Download ]
Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management
Dreseler, M., Kossmann, J., Boissier, M., Klauck, S., Uflacker, M., Plattner, H.
International Conference on Extending Database Technology, EDBT. pp. 313–324 (2019)
[ Download ]
Workload-Driven and Robust Selection of Compression Schemes for Column Stores
Boissier, M., Jendruk, M.
International Conference on Extending Database Technology, EDBT. pp. 674–677 (2019)
[ Download ]
Storing STL Containers on NVM.
Dreseler, M.
Persistent Programming in Real Life (2019)
[ Download ]
A Case for Hardware-Supported Sub-Cache Line Accesses
Schmidt, C., Dreseler, M., Akin, B., Roy, A.
Data Management on New Hardware (DaMoN), in conjunction with SIGMOD (2018)
[ Download ]
Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans
Dreseler, M., Kossmann, J., Frohnhofen, J., Uflacker, M., Plattner, H.
Joint Workshop of HardBD & and Active, in conjunction with ICDE (2018)
[ Download ]
Visual Evaluation of SQL Plan Cache Algorithms
Kossmann, J., Dreseler, M., Gasda, T., Uflacker, M., Plattner, H
Australasian Database Conference (ADC) (2018)
[ Download ]
Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads
Dreseler, M., Gasda, T., Kossmann, J., Uflacker, M., Plattner, H.
Australasian Database Conference (ADC) (2018)
[ Download ]
Hyrise-NV: Instant Recovery for In-Memory Databases using Non-Volatile Memory
Schwalb, D., Bk, G.K., Dreseler, M., S, A., Faust, M., Hohl, A., Berning, T., Makkar, G., Plattner, H., Deshmukh, P.
International Conference on Database Systems for Advanced Applications (DASFAA) (2016)
NVC-Hashmap: A Persistent and Concurrent Hashmap For Non-Volatile Memories
Schwalb, D., Dreseler, M., Uflacker, M., Plattner, H.
In-Memory Data Management Workshop (IMDM), in conjunction with VLDB (2015)
[ Download ]
Hyrise-R: Scale-out and Hot-Standby through Lazy Master Replication for Enterprise Applications
Schwalb, D., Kossmann, J., Faust, M., Klauck, S., Uflacker, M., Plattner, H.
Proceedings of the VLDB Workshop on In-Memory Data Mangement and Analytics (IMDM), in conjunction with VLDB (2015)
Composite Group-Keys: Space-efficient Indexing of Multiple Columns for Compressed In-Memory Column Stores
Faust, M., Schwalb, D., Plattner, H.
IMDM in conjunction with VLDB (2014)
[ DOI ]
Efficient Transaction Processing for Hyrise in Mixed Workload Environments
Schwalb, D., Faust, M., Wust, J., Grund, M., Plattner, H.
IMDM in conjunction with VLDB (2014)
An overview of HYRISE - a Main Memory Hybrid Storage Engine
Grund, M., Cudre-Mauroux, P., Krüger, J., Madden, S., Plattner, H.
IEEE Data Engineering Bulletin. (2012)
[ Download ]
Fast Lookups for In-Memory Column Stores: Group-Key Indices, Lookup and Maintenance
Faust, M., Krüger, J., Schwalb, D., Plattner, H.
ADMS (in conjunction with VLDB) (2012)
[ Download ]
A Demonstration of HYRISE—A Main Memory Hybrid Storage Engine
Grund, M., Cudre-Mauroux, P., Madden, S.
Proceedings of the VLDB Endowment 4(12). pp. 1434–1437 (2011)
[ Download ]
HYRISE—A Hybrid Main Memory Storage Engine
Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudre-Mauroux, P., Madden, S.
Proceedings of the VLDB Endowment 4(2). pp. 105–116 (2011)
[ Download ]

Publications – External

Towards High-performance and Trusted Cloud DBMSs
Lutsch, A., El-Hindi, M., István, Z., Binnig, C.
Datenbank-Spektrum 25(1): 39-50 (2025)
Scaling Freshness to Tera-scale Memory Using CXL and PIM
Dong, J., Rosenblum, J., Narayanasamy, S.: Toleo
ASPLOS (4): 313-328 (2024)
Tao: Improving Resource Utilization while Guaranteeing SLO in Multi-tenant Relational Database-as-a-Service
Liu, H., Li, R., Zhang, Z., Tang, B.
Proc. ACM Manag. Data 2(4): 205:1-205:26 (2024)
Moses: Heap Partitioning for Semantic Data Tiering
Eberhardt, F., Grapentin, A., Köhler, S., Grzelka, F., Hönig, T., Polze, A.
DIMES@SOSP. 25-32 (2024)
An Adaptive Column Compression Family for Self-Driving Databases
Fehér, M., Lucani, D., Chatzigeorgiou, I.
ADMS@VLDB. pp. 47-57 (2022)