Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Metanome Tool and Profiling Algorithms

The Metanome profiling tool is a framework for various profiling algorithms. It handles both algorithms and datasets as external resources, which is why there are no profiling algorithms contained in tool itself. Having algorithms as external resources is a design decision that allows researches to contribute profiling functionality without changing the tool itself. This makes Metanome on open profiling platform for both algorithm engineers and data scientists. The following image depicts the architecture of the profiling tool:

Metanome Tool

The newest version of the Metanome profiling tool is always available on GitHub. If you cannot build the sources, you can also download the following (less up-to-date) Metanome binaries:

Metanome Profiling Tool (v0.0.2, v1.0, v1.1)

Metanome Datasets

To load a dataset into the Metanome tool, it must be placed into the folder /WEB-INF/classes/inputData. It will then appear in the frontend in the import list. The datasets need to be relational and in some kind of csv or tsv format. The separator and quote characters can be defined in the frontend when importing the individual datasets. Alternatively to file-imports, one can specify a database connection in the frontend.

We provide some test datasets on our repeatability page.

Metanome Algorithms

In the context of the Metanome data profiling project, we developed and re-implemented the following profiling algorithms. To run a profiling algorithm, place the according jar-file into the folder /WEB-INF/classes/algorithms and register it in the Metanome frontend. If you want to write your own profiling algorithm for the Metanome tool, we recommend this Skeleton Project to start your development.

Unique Column Combination (Key Discovery)

Inclusion Dependency (Foreign-Key Discovery) repeatability page

Functional Dependencies (Normalization) repeatability page

Multivalued Dependencies (Normalization)

Order Dependencies (Data Ordering) repeatability page

Basic Statistics 

Cardinality Estimation (Zeroth-frequency moment of dataset ) repeatability page 

Schema Normalization

  • Normalize (v1.1) (Boyce-Codd Normal Form)