Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

SPIDER - Data Profiling

Many real life databases lack sufficient structural information such as foreign keys. These constraints are often not defined due to performance reasons, lacking knowlegde of the data, or due to dirty data, which do not entirely hold the constraints. Thus, we want to detect foreign keys automatically.

This problem is devidable in two steps: First, find all inclusion dependencies, i.e., attributes A and B such that all values of A are included in all values of B. This definition fits the syntactical and automatically testable part of a foreign key constraint. In the second step, we want to find heuristics to filter foreign keys from inclusion dependencies.

We developed our algorithm SPIDER (Single Pass Inclusion DEpendency Recognition) to detect inclusion dependencies over large schemas. The challenge is the quadratic complexity of the problem in the number of attributes. SPIDER sorts and "distincts" all attributes in the database system. Afterwards, it tests all attribute pairs in parallel while reading all values at most once. We showed that SPIDER clearly outperforms previous approaches for detecting inclusion dependencies exactly.

An extension of SPIDER detects partial inclusion dependencies to handle dirty data and detects composite inclusion dependencies to cover composite keys and foreign keys.

SPIDER was developed in the context of the Aladin project.

publications

Jana Bauckmann, Ulf Leser, Felix Naumann, Véronique Tietz: Efficiently Detecting Inclusion Dependencies. International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, (poster paper, to appear).
Jana Bauckmann, Ulf Leser, Felix Naumann, Joachim Schmid: Data Profiling: Effiziente Fremdschlüsselerkennung mit Aladin. German Information Quality Conference & Workshop, Bad Soden, November 2006.
Jana Bauckmann: Efficiently Identifying Inclusion Dependencies in RDBMS. 18. Workshop über Grundlagen von Datenbanken (GI-Workshop), Wittenberg, Juni 2006.
Jana Bauckmann, Ulf Leser, Felix Naumann: Efficiently Computing Inclusion Dependencies for Schema Discovery. Workshop InterDB (with ICDE06), Atlanta, April 2006.

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

News

17.11.2025 | New book chapter about "Data Quality for Enterprise AI" published

We are excited to announce that our new book chapter "Data Quality for Enterprise AI" has just been published. > Go to article

01.11.2025 | Paper accepted at WOP@ISWC

We are excited to announce that our paper "Is SHACL Suitable for Data Quality Assessment?" was accepted at the WOP … > Go to article

29.09.2025 | Paper accepted at NeurIPS 2025

We are excited to announce that our paper "Learning Conditional Marked Event Sequences with Mixed Data Types" was … > Go to article

29.09.2025 | Paper accepted at SIGMOD 2026

We are excited to announce that our paper "Burr: A Benchmark for Ontology Learning from Relational Databases" was … > Go to article

09.07.2025 | Paper accepted in SIGMOD Record

We are excited to announce that our paper “Table Dissolution: Adding Salt To Your Data” was accepted at the Ninth … > Go to article

Project highlights

Metanome: Big Data Profiling

Metis: Data Quality Assessment

Janus: Change exploration

KITQAR: AI and Data Quality