Data profiling constitutes the process of automatically analyzing a given dataset for metadata. The different techniques used in this process reveal intra-column properties, inter-column dependencies and various table-wide characteristics. Once determined, this metadata enables the data owner to detect errors, integrate other sources, normalize schemata, or define additional attribute properties.
The information systems group is currently developing a profiling platform called Metanome, which incorporates various algorithms for the discovery of Inclusion Dependencies, Functional Dependencies, Unique Column Combinations, and various other metrics. In this seminar, we join the Metanome project and design advanced profiling algorithms to be used in practice. More specifically, we examine different algorithms for the identification of Functional Dependencies, improve their performance, and finally integrate them into Metanome.