Information Systems Group

The research goal of the Information Systems Group is the efficient and effective management of heterogeneous information in large, autonomous systems. This includes methods for data profiling, data cleansing, search, and metadata management. Please also see our blog.

Research topics

The list below gives an overview of our research topics. Further details can be found on our project and our publications pages. In addition, we maintain a repeatability site to publish code and data.

  • Data Profiling: When integrating heterogeneous sources, details of the schema, such as keys, functional dependencies, and foreign key dependencies, are often unknown. We are developing efficient and scalable data profiling methods to automatically detect these and other dependencies in very large databases. Our Metanome project collects various high-efficiency methods into a common framework.
    Links: Metanome project, Vision paper, ProLOD++ tool
    Completed projects:  Spider algorithm, Aladin project
    GermanProLOD seminar, Advanced data profiling seminar, lecture
  • Data quality / information quality: The quality of data is measured in many different dimensions. Quality values can be aggregated along data operations, for instance to calculate the quality of query results.
    LinksICIQ 2009
    German: Schlagwort "Datenqualität" im Informatik Spektrum
  • Duplicate detection: Duplicates are multiple, different representations of the same real-world object, for instance, multiple records of a customer in a CRM database. Duplicate detection try to build systems that efficiently and effectively find such duplicates in large data sets.
    LinksSynthesis lecturerepeatabilityDuDe
    German: Duplikaterkennung allgemeinverständlich
  • Linked Open Data (LOD): More and more sources provide data in RDF form as linked open data. Such data serves as use case in a variety of projects.
    Links: HPI's open data activities, ProLOD
  • Data Fusion: Data fusion is the process of fusing multiple records representing the same real-world object, i.e., duplicates, into a single, consistent, and clean representation. Challenges are scalability over large data volumes and conflict resolution of contradictory values. 
    LinksFuSemHummerACM computing surveyVLDB tutorial
  • Text Mining: The analysis of text data, through which high-quality information can be extracted, is know as text mining. It helps understand, compare, and categorize vast quantities of textual data. 
    LinksEntity Linking


  • Bachelor: We offer regular german lectures in database systems, namely Datenbanksysteme I (DBS I) und Datenbanksysteme II (DBS II). In addition we offer the regular seminar "Beauty is our Business" and many other project-oriented seminars.
    One-year Bachelor Projects with 6-8 students finalize bachelor studies at HPI. Our group offers one or two such projects per year in cooperation with external partners.
  • Master: We frequently offer courses in "Information Integration", "Data Profiling", "Search Engines", and "Information Retrieval". In addition we offer diverse specialized seminars, some theoretical, some project-oriented.
    1/2-year Master Projects with 3-6 students examine a specific research question, usually resulting in a submission to an international conference.


2011, 2012, 2013