Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and occasionally functional dependencies and association rules. Individual research projects have proposed several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional dependencies.

The Metanome project is a joint project between the Hasso-Plattner-Institut (HPI) and the Qatar Computing Reserach Institute (QCRI). Metanome provides a fresh view on data profiling by developing and integrating efficient algorithms into a common tool, expanding on the functionality of data profiling, and addressing performance and scalabilities issues for Big Data. A vision of the project appears in SIGMOD Record: "Data Profiling Revisited" and demo of the Metanome profiling tool was given at VLDB 2015 " Data Profiling with Metanome".

The project can be found on GitHub: https://github.com/HPI-Information-Systems/Metanome. The Metanome tool is supplied under Apache License. You can use and extend the tool to develop your own profiling algorithms. The profiling algorithms contained in our downloadable Metanome build have HPI copyright. You are free to use and distribute them for research purposes.


Current Metanome Developers:
  • Tanja Bergmann
  • Moritz Finke
Former Metanome Developers:
  • Carl Ambroselli
  • Jakob Zwiener
  • Claudia Exeler