Our latest paper by Jan Kossmann, Daniel Lindner, as well as Thorsten Papenbrock and Felix Naumann from the Information Systems Group investigates how data dependencies can be discovered to be used for query optimization. Please feel free to contact Jan Kossmann for questions and discussions.
Effective query optimization is the backbone of relational database systems; query optimizers use all manners of tricks, from specialized auxiliary data structures over caching and batch processing to the use of secondary metadata. Nevertheless, these systems do not fully exploit the potential of data dependencies, such as functional dependencies, although dozens of dependency-based query optimization techniques exist. This disregard occurs because the required data dependencies are, in practice, hard to find and hard to maintain.
This paper presents our vision of a workload-driven, lazy dependency discovery system for query optimization. We propose a lightweight approach that identifies relevant data dependency candidates based on actually executed query plans, validates the candidates dynamically against the database, and maintains the results using different strategies that also exploit concepts of columnar DBMSs. Our evaluation demonstrates the feasibility of this approach and the potential of dependency-based optimizations, using a prototypical implementation in Hyrise that implements three exemplary dependency-based optimization techniques. For example, after automatic dependency discovery, these optimizations reduce the Join Order Benchmark's execution time by 27%.