Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Description

Information integration is the merging of heterogeneous information from various data sources to a homogenous, clean dataset. This lecture introduces this ever-important topic. It will cover the basic technologies, such as distributed database architectures, techniques for virtual and materialized integration, and data cleansing technologies.

Further Information:

  • Lectures are given in English on demand.
  • Slides are available on the HPI-internal materials-folder.
  • The exercises are led by Thorsten Papenbrock

Schedule

The course will take place Mondays and Wednesdays at 11am in HS 3. Some lectures will have the form of exercises.

DateTopic
MO 12.10.2015  Introduction
WE 14.10.2015DIY Integration
MO 19.10.2015Distribution, Autonomy, and Heterogeneity
WE 21.10.2015Exercise 1 + Lecture continued
MO 26.10.2015Distribution, Autonomy, and Heterogeneity
WE 28.10.2015Materialized and Virtual Integration
MO 02.11.2015Architectures
WE 04.11.2015SchemaSQL
MO 09.11.2015Exercise "Integration planning"
WE 11.11.2015Schema Mapping / Schema Matching
MO 16.11.2015No lecture
WE 18.11.2015No lecture
MO 23.11.2015Exercise "Integration Execution"
WE 25.11.2015Schema Mapping / Schema Matching
MO 30.11.2015Schema Mapping / Schema Matching
WE 02.12.2015Global-as-View Modelling - Moved to HS2
MO 07.12.2015Local-as-View Modelling
WE 09.12.2015Local-as-View Modelling
MO 14.12.2015Exercise "Cleansing"
WE 16.12.2015Bucket Algorithm - Gastvorlesung Dr. Armin Roth (IBM)
Christmas break
MO 04.01.2016Duplicate Detection
WE 06.01.2016Duplicate Detection and Data Quality
MO 11.01.2016Data Warehouses
WE 13.01.2016No lecture
MO 18.01.2016Exercise "Visualization"
WE 20.01.2016Data Lineage
MO 25.01.2016Deep Web
WE 27.01.2016Deep Web + exam preparation
MO 01.02.2016Exercise "Final presentations"
WE 03.02.2016No lecture
Mo 15.02.2016Written exam in HS 1

Literature

  • Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006.
    This book is available at the UP library and also, e.g., from Amazon.de.
  • Doan, Halevy, and Ives: Principles of data integration, Morgan Kaufmann, 2012.
  • Özsu and Valduriez: Principles of distributed database systems, Springer, 2011.
  • Stefan Conrad: Föderierte Datenbanksysteme, Springer,  1997.

Throughout the lecture I will refer to various scientific papers, that serve as in-depth references.

Exam

A written exam will take place on Feb. 15, 2015 at 13:00 in HS 1.