Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

About the Speaker

Prof. Dr.-Ing. Wolfgang Lehner is professor at TU Dresden, leading the database technology group as well as the institute of systems architecture. After earning his Ph.D. degree (Dr.-Ing.) from the University of Erlangen-Nuremberg in 1998, he joined the Business Intelligence group at IBM in California. There, he continued to work on optimizing different parts of database systems as in his dissertation. In 2001 he finished his habilitation at the University of Halle-Wittenberg and joined TU Dresden in 2002. In this talk, he presents some of the research he and his chair has been conducting regarding optimizing the optimizer.

About the Talk

The traditional cost-based query optimizer approach enumerates different execution plans for each individual query, assesses each plan with costs, and selects the plan that promises the lowest costs for execution. However, - as we all know - the optimal execution plan is not always selected. To steer the optimizer in the right direction, many database systems provide optimizer hints. These hints can be set for workloads, individual queries or even for query fragments.

Within this talk, we first show the potential of optimizer hinting by presenting the results of a comprehensive and in-depth evaluation using three benchmarks and two different versions of the open-source database system PostgreSQL. Subsequently, we highlight that query optimizer hinting is a nontrivial challenge and show two potential solutions: On the one hand, we propose TONIC, a novel cardinality estimation-free extension for generic SPJ query optimizers. TONIC follows a learning-based approach and revises operator decisions for arbitrary join paths based on learned query feedback. To continuously capture and reuse optimal operator selections, we introduce a lightweight yet powerful Query Execution Plan Synopsis (QEP-S). On the other hand, we provide insights into FASTgres, a context-aware classification strategy for a more holistic hint set prediction. Both strategies show in the context of end-to-end evaluations significant reductions of benchmark runtimes.

Optimizing the Optimizer

written by Menatalla Elnemr, Nicolas Christian Schilling and Felix Schulz

Goal, Problem, Solution

The talk discusses several projects that aim to improve the performance and (especially) the robustness of Query Optimization. While there are a multitude of good proposals to improve Query Optimization (i.e. with machine learning), many struggle with dynamic data, changing database schemas and consistent performance. The speaker introduces 4 different projects that approach these issues by looking at cardinality estimation (estimating the size of intermediate results), machine learning and better infrastructure for the development and testing of Query Optimization.

Background

The speaker started by explaining the rudiments and relevance of the optimization of Query Processing. First, Lehner introduced the basic steps of the process of query-processing:

  1. Generating a Logical Query Plan
  2. Rewrite and Transformation

Logical Query Plan

Step one is concerned with the optimization of a given query without considering the underlying hardware. It only regards the transformation of a query (which is basically a string) to a logical query plan. This includes checking, if the query is executable at all (Integrity Control) by looking at its structural integrity as well as checking whether a given user has access to the data needed to fulfill the query (Access Control). At the beginning of the process we have a query (like „SELECT X FROM Z“) and at the end we have a plan in the form of relational algebra, consisting of operations like join, selection etc.

Rewrite and Transformation

Step two is concerned with the optimization of the query plan derived from step one as well as transforming the plan into a Query Execution Plan. The latter is on the hardware level and the relation is similar to that of the relation between plain Java-Code and compiled Java-Code, meaning it contains instructions targeted at the physical layer, implementing the optimized Query Plan. It is to note that there might not be a one-to-one relationship between a logical operator and a physical operator (i.e. a logical operator might be implemented using many physical operators).

Are we polishing a round ball?

The speaker then poses the question, whether Query Optimization is a done deal since it has been researched since the early 1970s. The field contains a lot of research and many strategies (i.e. for join ordering) are well known and implemented. Lehner then answers the question by reframing the question of modern query optimization approaches: „Is it actually a ball?“.

The world is polystore

The environment of Database-Operations has changed greatly in the last decades. The speaker explains that even the application side of things has become very complex. Not only do we have to deal with very complex data-structures (no flat tables, inheritence hierarchies etc.), nonrelational data (no or no rigid schemas like json, XML, Graphs), and the sheer size of Databases. Lehner expounds that nowadays even a medium sized Database might contain 100.000 database objects. This results in very complex queries that contain multiple (10 to 20) joins. The problem this poses for Query Optimization is the fact that the performance of “traditional '' Optimization strategies varies greatly is therefore nor reliable and lead to tail latencies. When comparing the response times of different database queries, one will notice that most queries can be processed very quickly. The response time for a few inquiries, however, takes much longer. In one example (shown in Figure 1) this leads to tail latencies of 45 up to 130 seconds while most response times are well under that.

    Figure 1: Page 18 from the slides

    Estimation of cardinalities

    One very problematic issue for Query Optimization is the estimation of intermediate results or cardinalities. When a query is convoluted and the final results need the results of subordinate Queries to finish, these intermediate results might be a lot bigger than expected. In conclusion, the problems of modern Query Optimization might not necessarily be about the improvement of single Queries but also about the robustness of Query Plans. There are a multitude of promising concepts to improve this performance but they still lack in generality and stability. For example: If the Query Plan underestimates cardinalities, the performance of the Query will suffer greatly and will be well below possible speeds.

    Projects

    UES

    UES is an approach that focuses on the intermediate results of queries (since most database research is based on estimates for base-tables). UES is concerned with join ordering and physical operator selection. It tries to approach modern Databases which have a multitude of different data types, changing data and (even) changing data schemas. The optimal outcome of UES is considered being consistent or robust (in contrast to the fluctuations discussed in the previous section). Traditional estimates for cardinalities are very likely to over- or underestimate intermediate results (Point Estimates) but they don't consider uncertainty. In contrast to that UES uses a sketch-based approach, giving us a slow but robust upper bound for cardinality.

    UES gives us the maximum "Blow-Up", so the biggest possible number of join-partners produced by a given join. This method is robust. Using these upper-bounds actually helps reduce the response time for the problematic long-time running queries shown in Figure 2. Unfortunately it also increases response times for fast queries. The result is however robust and the tail latency can be reduced by orders of magnitude. This means that using UES produces consistent execution times but it slows down some queries. This is because the query optimizer chooses very pessimistic plans and loses opportunities (that might otherwise result in high tail latency). UES is working pretty well in dealing with outliers but it is not an overall sufficient solution for good intermediate estimation.

    TONIC

      Figure 2: Page 40 from the slides

      The Tonic-approach is in the area of physical operator selection. Tonic learns from past physical operator selections in order to improve Query Execution plans. The learned information is stored in a prefix tree matching join orders for all queries. It then clusters similar join orders and adds a cost to each node of the tree. The initial cost-estimation is done using an existing cost-model and is then refined. All new Queries are compared with existing solutions for similar Queries from the past. If a better (less cost) alternative for physical operators is found, it replaces the existing ones. After the execution it then compares the (actual) operator cost of the join order with the estimated cost. If new information is found, the tree is updated for future usage. In performance tests carried out with the Postgres database, TONIC was able to improve performance. Nevertheless, the results with tonic are below the theoretical optimum, as can be seen in Figure 2.

      FASTGres: Can we use ML instead?

        Figure 3: Page 53 from the slides

        After introducing UES and TONIC, the idea behind FASTgres is simple. Can machine learning reach similar performance improvements compared to the algorithmic approach? To make the problem more suitable to the context of machine learning they use hints instead of predicting estimates or operators. A hint can be used to configure the optimizer's behavior for a specific query. For example, using a nested loop join can be forced or forbidden. They validated the idea by exhaustively testing all 64 permutations of hints and found that the optimal hint set improved the performance on average by a factor of >2. Notably, each query's optimal hint set varies considerably, underscoring the need for a context-aware model.

        “Finding the optimal hint set for each query is not straight-forward.”

        FASTGres achieves this by training multiple small models for specific query contexts, utilizing gradient boosting to predict hint sets based on the featurization of predicates. They evaluate their approach on the benchmarks Stacka and JOB using Postgres 12 and 14. They get good results even with little training data, especially on Stack, while the low number of queries in the JOB benchmark makes training multiple models per context difficult and a version of FASTgres with only one model performs best.

          Figure 4: Page 55 from the slides

          PostBOUND: Develop and Evaluate Novel Optimizations

          When developing a novel optimizer component such as FASTgres a lot of boilerplate code is needed and fair comparisons to existing optimizer components are hard to achieve. PostBound is introduced as a framework to tackle these problems. Its goal is to make the development of new optimizer ideas as easy as possible while using actual database systems to execute evaluations that are reproducible and comparable.

            Figure 5: Page 58 from the slides

            Python is used to interact with the framework. It provides the infrastructure around Query Optimization. Common data structures like join graphs are already implemented and the scope of the novel optimization algorithm can vary from the entire physical pan to just tweaking the cardinalities. To enforce the optimization decisions on different database systems, the PostBOUND backends utilize hinting and other tricks depending on which database was chosen (e.g. Postgres, MySQL, etc.). This system enables comparable evaluations while minimizing the code needed for new experiments.

            Conclusion

            The speaker demonstrated that there is still a lot of potential in researching Query Optimization. He showed that consistent query runtimes are reachable and modern optimization approaches need to address non-robust optimization, diversity of data structures, dynamic data and changing database schemas. He proposed four projects to tackle these challenges. UES uses pessimistic but robust estimates for cardinalities to deal with long running queries.

            • UES uses pessimistic but robust estimates for cardinalities to deal with long running queries.
            • TONIC improves Query Execution Plans by storing past information in a prefix-tree and comparing new queries with past ones.
            • FASTGres uses machine learning and query hinting to configure the optimizer's behavior to use efficient join and scan operators.
            • PostBOUND is a framework that facilitates the development of new optimization approaches and their comparisons.

            Sources

            • Hertzschuch, Hertzschuch (2023): Robust Query Optimization for Analytical Database Systems. Dissertation, Technische Universität Dresden
            • Lehner, Wolfgang (2024): On the Potential of Optimizing the Optimizer (Lecture held by Wolfgang Lehner, HPI Lecture Series Jan 23rd , 2024)