Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Information Integration

Information integration is the merging of heterogeneous information from various data sources to a homogenous, clean dataset. This lecture introduces this ever-important topic. It will cover the basic technologies, such as distributed database architectures, techniques for virtual and materialized integration, and data cleansing technologies.

Further Information:

  • Lectures can be given in English, on demand.
  • Slides will be made available on the HPI-internal "Materials" folder: navigate to FG-Informationssysteme - VL-Informationsintegration - 2022.
  • A recording of the previous edition (winter 2019/20) can be found on tele-task (in German).

Exercises:

  • The exercises are led by Tobias Bleifuß in collaboration with bakdata.
  • It is necessary to pass the exercise to be admitted to take the exam.
  • In the exercises, you will work in small teams to build an information platform for corporate data.
  • We recommend using either Java or Python for your implementation.
  • We will kick-off the exercises on April 28, 2022 together with bakdata.

Schedule

The course will take place Tuesdays at 15:15 and Thursdays at 9:15 in in L-E.03. Some lectures will have the form of exercises.

If needed, we will stream the lecture via Zoom; you can find the link in our "Materials" folders (navigate to FG-Informationssysteme - VL-Informationsintegration - 2022).

DateTopic
Tue 19.4.2022Introduction
Thu 21.4.2022Introduction and Heterogeneity
Tue 26.4.2022Heterogeneity
Thu 28.4.2022Exercise with bakdata
Tue 03.5.2022Heterogeneity
Thu 05.5.2022Materialized and virtual integration
Tue 10.5.2022No lecture
Thu 12.5.2022Exercise in HS3 (No lecture)
Tue 17.5.2022Architectures
Thu 19.5.2022Multidatabase Quwery Languages
Tue 24.5.2022Schema Mapping / Schema Matching
Thu 26.5.2022Ascension
Tue 31.5.2022Schema Mapping / Schema Matching
Thu 02.6.2022Exercise
Tue 07.6.2022Schema Mapping / Schema Matching
Thu 09.6.2022 
Tue 14.6.2022Exercise (provisional date) No lecture*
Thu 16.6.2022 
Tue 21.6.2022 
Thu 23.6.2022In L-1.02
Tue 28.6.2022 
Thu 30.6.2022 
Tue 05.7.2022 
Thu 07.7.2022Exercise (provisional date)
Tue 12.7.2022 
Thu 14.7.2022 
Tue 19.7.2022No lecture*
Thu 21.7.2022Exercise (provisional date) No lecture*
Tue 26.7.2022 
Thu 28.7.2022 

* but potentially exercises

Literature & Exam

  • Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006 (free pdf).
    This book is available at the UP library and also, e.g., from Amazon.de.
  • Doan, Halevy, and Ives: Principles of data integration, Morgan Kaufmann, 2012.
  • Özsu and Valduriez: Principles of distributed database systems, Springer, 2011.
  • Stefan Conrad: Föderierte Datenbanksysteme, Springer,  1997.

Throughout the lecture, I will refer to various scientific papers, that serve as in-depth references.

Depending on the number of participants, we will conduct a written or an oral exam.