Paving the Way for Self-Managing Database Systems
The performance of a database system depends on its configuration. Modern database systems offer many inter-dependent configuration options to allow the processing of variable workloads from different domains and running on heterogeneous hardware. The amount of possible configurations increases exponentially with the available options. Thus, the - already expensive - configuration process surpasses the capabilities of human database administrators. To tackle this issue, self-managing database systems utilize workload-driven optimization and machine learning techniques to configure database systems.
We focus our work on three specific self-managing database challenges: (i) system integration, (ii) index selection, and (iii) cost estimation. (i) System integration: DBMSs were not designed with self-managing capabilities in mind. We propose a generalized framework that provides facilities to enable self-managing DBMS by providing components for workload monitoring, forecasting, and tuning. (ii) Index selection: Diverse and volatile workloads from different applications complicate the selection of performance-enhancing indexes. We developed an efficient and scalable index selection approach that accounts for index interaction and reconfiguration costs while outperforming the runtime of state-of-the-art algorithms. (iii) Cost estimation: knowledge of query costs is crucial to determine efficient query execution plans. Self-managing systems must assess and quantify the cost impact of options available to them to be able to select the most beneficial one. We generate cost estimations with high accuracy by training estimation models continuously on actual runtime observations.
Our contributions pave the way for self-managing database systems by providing solutions for core challenges in this field. The aforementioned techniques are implemented in the research database system Hyrise.