Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

Similarity Search

Similarity search refers to the task of finding objects that are similar to a given query in a set of objects. Common DBMS only provide means to efficiently find exact matches to a given query. In case of typing errors, omitted or transposed attribute values or other typical data quality problems in queries, exact search algorithms fail to find all relevant objects in the queried data set.

In this project, we survey existing and develop new algorithms for effective and efficient similarity search. Effective similarity search can be achieved by defining a similarity measure that is well-suited for the given domain. For efficient similarity search, an index structure is required that precomputes similarities of objects to answer queries as fast as possible.

This project is supported by SCHUFA Holding AG.

Project members:

Master's theses:

Matthias Pohl: Automatisierte Konfiguration des D-Index zur Ähnlichkeitssuche, 2011
Dandy Fenz: Effiziente Ähnlichkeitssuche in einer großen Menge von Zeichenketten mittels Key-Value-Store, 2011

Publications

Bulk Sorted Access for Efficient Top-k Retrieval. Lange, Dustin; Naumann, Felix (2013).

[ Details ]

Cost-Aware Query Planning for Similarity Search. Lange, Dustin; Naumann, Felix in Information Systems (IS) (2013). 38(4) 455–469.

[ Details ]

Efficient Similarity Search in Very Large String Sets. Fenz, Dandy; Lange, Dustin; Rheinländer, Astrid; Naumann, Felix; Leser, Ulf (2012).

[ Details ]

Scalable Similarity Search with Dynamic Similarity Measures. Köppelmann, Martin; Lange, Dustin; Lehmann, Claudia; Marszalkowski, Marika; Naumann, Felix; Retzlaff, Peter; Stange, Sebastian; Voget, Lea (2012).

[ Details ]

Projektseminar "Similarity Search Algorithms". Lange, Dustin; Vogel, Tobias; Draisbach, Uwe; Naumann, Felix in Datenbank-Spektrum (2011). 11(1) 51–57.

[ Details ]

Efficient Similarity Search: Arbitrary Similarity Measures, Arbitrary Composition. Lange, Dustin; Naumann, Felix (2011). 1679–1688.

[ Details ]

Frequency-aware Similarity Measures. Lange, Dustin; Naumann, Felix (2011). 243–248.

[ Details ]

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

News

Project highlights

Metanome: Big Data Profiling

Metis: Data Quality Assessment

Janus: Change exploration

KITQAR: AI and Data Quality

Similarity Search

Publications

Chair

News

17.11.2025 | New book chapter about "Data Quality for Enterprise AI" published

01.11.2025 | Paper accepted at WOP@ISWC

29.09.2025 | Paper accepted at NeurIPS 2025

29.09.2025 | Paper accepted at SIGMOD 2026

09.07.2025 | Paper accepted in SIGMOD Record

Project highlights

People and open positions