Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Description

Imagine you can help decision makers in large organizations by creating a uniform, cleaned, and trusted data set out of their multitude of heterogeneous information sources. This lecture introduces techniques to address this ever-important topic. It covers different architectures, matching and mapping approaches, query processing, and data cleansing.

We focus on structured information, but briefly discuss how to deal with textual information sources.

The lecture will be given by Dr. Armin Roth (IBM).

Schedule

The lecture will take place the week from October 9th - 13th, from 09:15 am - 04:45 pm at Campus III, G-3.E.15/16.

 

DateTopic
MO
09.10.2017        
Introduction, Distribution, Autonomy, Heterogeneity, DIY Integration, Architectures
TU 
10.10.2017
Mapping Languages, Schema Matching and Mapping, Global-as-View Modelling
WE
11.10.2017
Local-as-View Modelling, Answering Queries using Views (Bucket Algorithm), Duplicate Detection
TH 
12.10.2017
Data Fusion, Data Quality, Data Warehouses, ETL and Data Lineage
FR
13.10.2017

Information Integration on the Web, Knowledge Harvesting, Semantic Search

Literature

  • Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006.
    This book is available at the UP library and also, e.g., from Amazon.de.
  • Doan, Halevy, and Ives: Principles of data integration, Morgan Kaufmann, 2012.
  • Özsu and Valduriez: Principles of distributed database systems, Springer, 2011.
  • Stefan Conrad: Föderierte Datenbanksysteme, Springer,  1997.

Throughout the lecture I will refer to various scientific papers, that serve as in-depth references.

Exam

A written exam will take place end of semester. A specific date will be set soon. An earlier exam would be possible. Please talk to Dr. Armin Roth if you wish so.