Our submission entitled "Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization" was accepted to the CIDR 2022 conference.
Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization
Jan Kossmann, Felix Naumann, Daniel Lindner, Thorsten Papenbrock
Effective query optimization is the backbone of relational database systems; query optimizers use all manners of tricks, from specialized auxiliary data structures over caching and batch processing to the use of secondary metadata. Nevertheless, these systems do not fully exploit the potential of data dependencies, although dozens of such dependency-based query optimization techniques exist. This disregard occurs because the required data dependencies are, in practice, hard to find and hard to maintain.
This paper presents our vision of a workload-driven, lazy dependency discovery system for query optimization. We propose a lightweight approach that identifies relevant data dependency candidates based on actually executed query plans, validates the candidates dynamically against the database, and maintains the results using different strategies that also exploit concepts of columnar DBMSs. Our evaluation demonstrates the feasibility of this approach and the potential of dependency-based optimizations using a prototypical implementation in Hyrise that implements three exemplary dependency-based optimization techniques.