Prof. Dr. Felix Naumann

Metanome Tool and Profiling Algorithms

The Metanome profiling tool is a framework for various profiling algorithms. It handles both algorithms and datasets as external resources, which is why there are no profiling algorithms contained in tool itself. Having algorithms as external resources is a design decision that allows researches to contribute profiling functionality without changing the tool itself. This makes Metanome on open profiling platform for both algorithm engineers and data scientists. The following image depicts the architecture of the profiling tool:

Metanome Tool

The newest version of the Metanome profiling tool is always available on GitHub. If you cannot build the sources, you can also download the following (less up-to-date) Metanome binaries:

Metanome Profiling Tool (v0.0.2, v1.0, v1.1)

Metanome Algorithms

In the context of the Metanome data profiling project, we developed and re-implemented the following profiling algorithms. To run a profiling algorithm, place the according jar-file into the folder /WEB-INF/classes/algorithms and register it in the Metanome frontend. If you want to write your own profiling algorithm for the Metanome tool, we recommend this Skeleton Project to start your development.

Unique Column Combination (Key Discovery)

Inclusion Dependency (Foreign-Key Discovery) Repeatability-Page

Functional Dependencies (Normalization) Repeatability-Page

Order Dependencies (Data Ordering) Repeatability-Page

Basic Statistics