Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Repeatability - DCs

This is a repeatability page for DC discovery algorithms. The algorithms are provided in the state their results have been published, but they may not represent the most recent version of their implementations.

DC Algorithms

The efficient discovery of denial constraints in tables is a challenging task. Up until now, we have not published any algorithms for this problem, but as soon as we have a publication we will add the respective algorithms here.

The data profiling tool Metanome provides standardized interfaces to facilitate the comparison of different DC discovery methods.

Datasets

All DC algorithms have been exhaustively tested on the following datasets:

NameSourceColumnsRowsSize
Hospitalhttps://data.medicare.gov/15114,91930.6 MB
Stockhttp://pages.swcp.com/stocks/7122,4965.3 MB
Tax151,000,00073 MB