Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
  
 

Information Systems Group

The research goal of the Information Systems Group is the efficient and effective management of heterogeneous information in large, autonomous systems. This includes methods for data profiling, data cleansing, search, and metadata management. Please also see our blog.

Research topics

The list below gives an overview of our research topics. Further details can be found on our project and our publications pages. In addition, we maintain a repeatability site to publish code and data.

Data Management

  • Data Profiling: When integrating heterogeneous sources, details of the schema, such as keys, functional dependencies, and foreign key dependencies, are often unknown. We are developing efficient and scalable data profiling methods to automatically detect these and other dependencies in very large databases. Our Metanome project collects various high-efficiency methods into a common framework.
    Links: Metanome project, Vision paper, ProLOD++ tool
    Completed projects:  Spider algorithm, Aladin project
    GermanProLOD seminar, Advanced data profiling seminar, lecture
  • Data quality / information quality: The quality of data is measured in many different dimensions. Quality values can be aggregated along data operations, for instance to calculate the quality of query results.
    LinksICIQ 2009
    German: Schlagwort "Datenqualität" im Informatik Spektrum
  • Duplicate detection: Duplicates are multiple, different representations of the same real-world object, for instance, multiple records of a customer in a CRM database. Duplicate detection try to build systems that efficiently and effectively find such duplicates in large data sets.
    LinksSynthesis lecturerepeatabilityDuDe
    German: Duplikaterkennung allgemeinverständlich
  • Linked Open Data (LOD): More and more sources provide data in RDF form as linked open data. Such data serves as use case in a variety of projects.
    Links: HPI's open data activities, ProLOD
  • Data Fusion: Data fusion is the process of fusing multiple records representing the same real-world object, i.e., duplicates, into a single, consistent, and clean representation. Challenges are scalability over large data volumes and conflict resolution of contradictory values. 
    LinksFuSemHummerACM computing surveyVLDB tutorial

Web Science

  • Text Mining: The analysis of text data, through which high-quality information can be extracted, is know as text mining. It helps understand, compare, and categorize vast quantities of textual data.
  • Information Retrieval: Providing access to information was for a long time the task of libraries. With the rise of the Web search engines became a tool for everyone to use everyday. Information retrieval deals with searching and finding information not only in the Web, but also in digital libraries and other information systems.
  • Entity Linking: Finding and extracting entities from textual documents is very important for a variety of  applications. Entity linking is a new research area that deals with linking these entities to entries in knowledge bases.
    LinksEntity Linking
  • Recommender Systems: With the huge amount of information available today, recommender systems play an increasing role in everyday life. They enable personalized filtering of, e.g. news, products, or Web content.
  • Social Network Analysis: Social networks, such as Facebook or Twitter, connect people and content with each other. Understanding these connections and the flow of information in a network is relevant for many application areas, e.g. advertisement, emergency response, or community detection.

Teaching

  • Bachelor: We offer regular german lectures in database systems, namely Datenbanksysteme I (DBS I) und Datenbanksysteme II (DBS II). In addition we offer a regular introductory seminar on selected database topics, and other occasional project-oriented seminars.
    One-year bachelor Projects with 6-8 students finalize bachelor studies at HPI. Our group offers one or two such projects per year in cooperation with external partners.
  • Master: We frequently offer courses in "Information Integration", "Data Profiling", "Search Engines", and "Information Retrieval". In addition we offer diverse specialized seminars, some theoretical, some project-oriented.
    1/2-year master Projects with 3-6 students examine a specific research question, usually resulting in a submission to an international conference. 1/2 year master's theses are the final step before graduation.

Jahresberichte

2011, 2012, 2013 (annual reports in German)