Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario,
which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to
derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign
keys, and occasionally functional dependencies and association rules. Individual research projects have proposed
several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional
dependencies.
The Metanome project is a joint project between the Hasso-Plattner-Institut
(HPI) and the Qatar Computing Reserach Institute (QCRI).
Metanome provides a fresh view on data profiling by developing and integrating efficient algorithms into a common
tool, expanding on the functionality of data profiling, and addressing performance and scalabilities issues for
Big Data. A vision of the project appears in SIGMOD Record:
"Data Profiling
Revisited" and demo of the Metanome profiling tool was given at VLDB 2015
"
Data Profiling with Metanome".
The project can be found on GitHub:
https://github.com/HPI-Information-Systems/Metanome. The Metanome tool is supplied under Apache License.
You can use and extend the tool to develop your own profiling algorithms. The profiling algorithms contained in
our downloadable Metanome build have HPI copyright. You are free to use and distribute them for research purposes.
Current Metanome Developers:
- Tanja Bergmann
- Moritz Finke
Former Metanome Developers:
- Carl Ambroselli
- Jakob Zwiener
- Claudia Exeler