Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Publications (sorted in inverse chronological order)

A Hybrid Approach to Functional Dependency Discovery

Papenbrock, Thorsten; Naumann, Felix in Proceedings of the International Conference on Management of Data (SIGMOD) page 821-833 . New York, NY, USA , ACM , 2016 .

Functional dependencies are structural metadata that can be used for schema normalization, data integration, data cleansing, and many other data management tasks. Despite their importance, the functional dependencies of a specific dataset are usually unknown and almost impossible to discover manually. For this reason, database research has proposed various algorithms for functional dependency discovery. None, however, are able to process datasets of typical real-world size, e.g., datasets with more than 50 attributes and a million records. We present a hybrid discovery algorithm called HyFD, which combines fast approximation techniques with efficient validation techniques in order to find all minimal functional dependencies in a given dataset. While operating on compact data structures, HyFD not only outperforms all existing approaches, it also scales to much larger datasets.
[ URL ]
mod922.pdf
Further Information
Tags discovery functional_dependencies hpi hybrid hyfd isg metadata parallel profiling