Detecting Maximum Inclusion Dependencies without Candidate Generation

Inclusion dependencies (INDs) within and across databases are an important relationship for many applications in data integration, schema (re-)design, integrity checking, or query optimization. Existing techniques for detecting all INDs need to generate IND candidates and test their validity in the given data instance. However, the major disadvantage of this approach is the exponentially growing number of data accesses in terms of the number of SQL queries as well as I/O operations. We introduce Mind2, a new approach for detecting n-ary INDs (n > 1) without any candidate generation. Mind2 implements a new characterization of the maximum INDs we developed in this paper. This characterization is based on set operations defined on certain metadata that Mind2generates by accessing the database only 2 x the number of valid unary INDs. Thus, Mind2 eliminates the exponential number of data accesses needed by existing approaches. Furthermore, the experiments show that Mind2 is significantly more scalable than hypergraph-based approaches.
Tags Data_analysis Data_integration Data_mining Data_profiling Inclusion_dependency Mind2 its

