Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

Content

Authors

Alexander Bilke, Felix Naumann

Abstract

Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names.

Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach. [more]

Algorithm

Dumas (Duplicate based Matching of Schemas)

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
Fax: +49 (0)331 5509-287
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

Project highlights

Metanome: Big Data Profiling

Data Preparation

Janus: Change exploration

KITQAR: AI and Data Quality

Content

Authors

Abstract

Algorithm

Chair

News

03.04.2024 | Congratulations to the EDBT Best Paper Award!

05.03.2024 | Another Paper marked as reproducible by pVLDB Reproducibility Committee

21.01.2024 | Paper accepted at W-NUT 2024

19.12.2023 | Congratulations Dr. Gerardo Vitagliano!

13.12.2023 | Two papers accepted at EDBT Conference 2024

Project highlights

People and open positions