Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher, and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand.

In our research projects we try to develop efficient and scalable dependency detection algorithms, both for relational data in the Metanome project and for RDF data in the ProLOD++ project. Please see the menu for more projects.

Current projects

Metanome: A framework and application for efficient profiling algorithms on large relational datasets
Janus: Project on data change exploration

Completed projects

ProLOD and ProLOD++: An interactive application to profile RDF data.
MetaCrate: A database for data profiles
Mining RDF data: synonym discovery, ontology alignment and data enrichment.
Stratosphere data profiling: We are developing distributed data profiling algorithms for Stratosphere and other distributed processing platforms
SPIDER: An efficient algorithm to detect inclusion dependencies and foreign keys
BTC: The results of our two participations in the Billion Triples Challenges
XStruct: Automatically extract schemata from XML documents

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

Project highlights

Metanome: Big Data Profiling

Data Preparation

Janus: Change exploration

KITQAR: AI and Data Quality

Current projects

Completed projects

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions