Prof. Dr. Felix Naumann

What is Metacrate?

In few words, Metacrate is a database for data profiles. Data management applications can use it as a library to store, organize, and analyze query profiles in many different ways. Technically, Metacrate consists of a logical data model that can be hosted on several storage backends, an analytics engine to query and integrate data profiles, and a library of common data management algorithms. Make sure to also have a look at our data profiling tool Metanome, whose profiling results can be easily imported into Metacrate to get started.

For an example of how to get started with Metacrate and reverse engineer a dataset, check out our screencast below!

Getting started

Metacrate is hosted on GitHub as an open source project. You are free to use it as a library in your own projects. Or if you just want to play around with Metacrate, you can do so within a Jupyter notebook that runs the Jupyter-Scala kernel. In this setup, we also provide enhanced visualizations based on Plotly and D3 For installation instructions, visit the repository. Also, check out our example notebooks that demonstrate how Metacrate can support data anamnesis, data cleaning, and data discovery.

Below, we show extracts from the notebooks and further resources can be found at the far bottom of this page.

Example 1

Analyze the table sizes in a dataset by combining their numbers of columns and tuples and displaying the result in a scatter chart. 

Example 2

Show the information content of tables as well as the information content of join relationships to other tables. Display the result in a chord plot.

Example 3

Having assessed the importance of tables and having clustered the tables, plot the enriched schema graph.


If you are facing trouble with Metacrate, we would be happy if you filed a GitHub issue. For other questions or feedback, please contact Sebastian Kruse.