Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

15.04.2015

Second paper accepted at VLDB 2015

Experiments and Analysis Paper

Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms

Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schönberg, Jakob Zwiener, Felix Naumann

Abstract.Functional dependencies are important and interesting meta-data, for instance used for schema normalization or data cleansing. The efficient discovery of functional dependencies in tables is a well-known challenge in database research and has seen several approaches. In this experimental paper we describe, evaluate, and compare the seven most cited and most important algorithms, all solving this same problem.

First, we classify the algorithms into three different categories, explaining their commonalities. The then following descriptions of the algorithms each point out their main ideas and describe additional details where the original papers were ambiguous or incomplete. Our evaluation of careful re-implementations of all algorithms spans a broad test space including synthetic and real-world data. We show that all functional dependency algorithms optimize for certain data characteristics. We conclude that the current ap- proaches scale surprisingly poorly, showing potential for future research.