Tobias Bleifuß, Sebastian Kruse and Felix Naumann
Our paper "Efficient Denial Constraint Discovery with Hydra" has been accepted for presentation at the 44th International Conference on Very Large Data Bases (VLDB 2018). VLDB 2018 will take place in Rio de Janeiro, Brazil, from August 27th to August 31st, 2018.
Denial constraints (DCs) are a generalization of many other integrity constraints (ICs) widely used in databases, such as key constraints, functional dependencies, or order dependencies. Therefore, they can serve as a unified reasoning framework for all of these ICs and express business rules that cannot be expressed by the more restrictive IC types. The process of formulating DCs by hand is difficult, because it requires not only domain expertise but also database knowledge, and due to DCs' inherent complexity, this process is tedious and error-prone. Hence, an automatic DC discovery is highly desirable: we search for all valid denial constraints in a given database instance. However, due to the large search space, the problem of DC discovery is computationally expensive.
We propose a new algorithm Hydra, which overcomes the quadratic runtime complexity in the number of tuples of state-of-the-art DC discovery methods. The new algorithm's experimentally determined runtime grows only linearly in the number of tuples. This results in a speedup by orders of magnitude, especially for datasets with a large number of tuples. Hydra can deliver results in a matter of seconds that to date took hours to compute.