We are happy to announce that our project "DynFD: Functional Dependency Discovery in Dynamic Datasets" has been accepted as a full paper at EDBT 2019.
Authors: Philipp Schirmer, Thorsten Papenbrock, Sebastian Kruse, Dennis Hempfing, Torben Meyer, Daniel Neuschäfer-Rube, Felix Naumann
Abstract: Functional dependencies (FDs) support various tasks for the management of relational data, such as schema normalization, data cleaning, and query optimization. However, while existing FD discovery algorithms regard only static datasets, many real-world datasets are constantly changing – and with them their FDs. Unfortunately, the computational hardness of FD discovery prohibits a continuous re-execution of those existing algorithms with every change of the data.
To this end, we propose DynFD, the first algorithm to discover and maintain functional dependencies in dynamic datasets. Whenever the inspected dataset changes, DynFD evolves its FDs rather than recalculating them. For this to work efficiently, we propose indexed data structures along with novel and efficient
update operations. Our experiments compare DynFD’s incremental mode of operation to the repeated re-execution of existing, static algorithms. They show that DynFD can maintain the FDs of dynamic datasets over an order of magnitude faster than its static counter-parts.