Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Repeatability - FDs

This is a repeatability page for FD discovery algorithms. The algorithms are provided in the state their results have been published, but they may not represent the most recent version of their implementations. To get the more up-to-date version of the algorithms, use the binaries provided here.

FD Algorithms

The efficient discovery of functional dependencies in tables is a well-known challenge in database research and has seen several approaches. The following FD algorithms represent the current state-of-the-art.

Lattice traversal traversal algorithms:
TANE
jarcode
FUN
jarcode
FD_Mine
jarcode
DFDjarcode
Difference- and agree-set algorithms:
Dep-Miner
jarcode
FastFDs
jarcode
Dependency induction algorithms:
fdepjarcode

All seven FD algorithms can be executed with the data profiling tool Metanome. A Metanome build in version 0.0.2 can be downloaded here.

OD Algorithms

The efficient discovery of order dependencies in tables is related to that of functional dependency discovery. An order dependency states that ordering a table by one set of attributes also orders the table by another set of attributes.

ORDER
        jar            code

Datasets

All FD algorithms have been exhaustively tested on the following datasets:

NameSourceColumnsRowsSizeFDsODs
irisuci51505 KB4
balance-scaleuci56257 KB1
chessuci728.056519 KB1
abaloneuci94177187 KB1370
nurseryuci912.9601,024 KB1
breast-cancer-wisconsinuci1169920 KB460
bridgesuci131086 KB1420
echocardiogramuci131326 KB53812
adultuci1448.8423,528 KB780
letteruci1620000695 KB610
ncvoteralt.ncsbe.gov191000151 KB758
ncvoterncsbe.gov22938.085230 MB>90
hepatitisuci201558 KB82500
horseuci2730025 KB1287263
fd-reduced-30dbtesma30250.00069,581 KB89571
plistaplista631000568 KB178152
flightbts.gov1091000575 KB982631
flightbts.gov20500.00071 MB>1545
uniprotuniprot.org22310002,439 KBunknown
lineitemtpc.org162.999.671368 MB1