Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Information Integration

Information integration is the merging of heterogeneous information from various data sources to a homogenous, clean dataset. This lecture introduces this ever-important topic. It will cover the basic technologies, such as distributed database architectures, techniques for virtual and materialized integration, and data cleansing technologies.

Further Information:

  • Lectures can be given in English, on demand.
  • Slides will be made available on the HPI-internal materials-folder.
  • The lectures will be recorded by tele-task.
  • The exercises are led by Tim Repke.

Schedule

The course will take place Mondays in F-E.06 and Thursdays in H-2.57 at 09:15 AM. Some lectures will have the form of exercises.

Date Topic
MO 14.10. No lecture - HPI Plenary Meeting
TH 17.10. Exercise: Organization and Task Introduction
MO 21.10. Introduction
TH 24.10. Distribution, Autonomy, and Heterogeneity
MO 28.10. in F-E.06 Exercise: Extracted Datasets
Reformation Day --
MO 04.11. Distribution, Autonomy, and Heterogeneity
TH 07.11. Materialized and Virtual Integration
MO 11.11. Exercise: Project Task Definition (analytics question)
TH 14.11. in F-E.06 Web Table Research (Hazar Harmouch)
Architectures (Felix Naumann)
MO 18.11. Architectures
TH 21.11. no lecture
MO 25.11.  
TH 28.11.  
MO 02.12.  
TH 05.12. Exercise: Data Integration (schema matching, transformation, normalization)
MO 09.12.  
TH 12.12.  
MO 16.12.  
TH 19.12.  
Christmas break --
MO 06.01.  
TH 09.01. Exercise: Data Cleansing (duplicate detection, linkage, data fusion)
MO 13.01.  
TH 16.01.  
MO 20.01.  
TH 23.01.  
MO 27.01.  
TH 30.01.  
MO 03.02. Exercise: Analytics (visualizations, etc to answer the initial question)
TH 06.02. Exam Preparation
tbd Written Exam in HS 1

Office Hours

If you have any questions relating the lexture or exercise, feel free to contact Tim Repke or come by during the office hours:

Every Monday, 14:00 - 15:00
Room F-2.07

Exceptions:

  • 09.11. (any other day this week per request)
  • 23.12. - 05.01.

Literature & Exam

  • Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006 (free pdf).
    This book is available at the UP library and also, e.g., from Amazon.de.
  • Doan, Halevy, and Ives: Principles of data integration, Morgan Kaufmann, 2012.
  • Özsu and Valduriez: Principles of distributed database systems, Springer, 2011.
  • Stefan Conrad: Föderierte Datenbanksysteme, Springer,  1997.

Throughout the lecture, I will refer to various scientific papers, that serve as in-depth references.

A written exam will take place on XXX in YYY.