Information Integration
Description
Information integration is the merging of heterogeneous information from various data sources to a homogenous, clean dataset. This lecture introduces this ever-important topic. It will cover the basic technologies, such as distributed database architectures, techniques for virtual and materialized integration, and data cleansing technologies.
Further Information:
- Lectures are given in English on demand.
- Slides are available on the HPI-internal materials-folder.
- The exercises are led by Thorsten Papenbrock
Schedule
The course will take place Mondays and Wednesdays at 11am in HS 3. Some lectures will have the form of exercises.
| Date | Topic |
|---|---|
| MO 12.10.2015 | Introduction |
| WE 14.10.2015 | DIY Integration |
| MO 19.10.2015 | Distribution, Autonomy, and Heterogeneity |
| WE 21.10.2015 | Exercise 1 + Lecture continued |
| MO 26.10.2015 | Distribution, Autonomy, and Heterogeneity |
| WE 28.10.2015 | Materialized and Virtual Integration |
| MO 02.11.2015 | Architectures |
| WE 04.11.2015 | SchemaSQL |
| MO 09.11.2015 | Exercise "Integration planning" |
| WE 11.11.2015 | Schema Mapping / Schema Matching |
| MO 16.11.2015 | No lecture |
| WE 18.11.2015 | No lecture |
| MO 23.11.2015 | Exercise "Integration Execution" |
| WE 25.11.2015 | Schema Mapping / Schema Matching |
| MO 30.11.2015 | Schema Mapping / Schema Matching |
| WE 02.12.2015 | Global-as-View Modelling - Moved to HS2 |
| MO 07.12.2015 | Local-as-View Modelling |
| WE 09.12.2015 | Local-as-View Modelling |
| MO 14.12.2015 | Exercise "Cleansing" |
| WE 16.12.2015 | Bucket Algorithm - Gastvorlesung Dr. Armin Roth (IBM) |
| Christmas break | |
| MO 04.01.2016 | Duplicate Detection |
| WE 06.01.2016 | Duplicate Detection and Data Quality |
| MO 11.01.2016 | Data Warehouses |
| WE 13.01.2016 | No lecture |
| MO 18.01.2016 | Exercise "Visualization" |
| WE 20.01.2016 | Data Lineage |
| MO 25.01.2016 | Deep Web |
| WE 27.01.2016 | Deep Web + exam preparation |
| MO 01.02.2016 | Exercise "Final presentations" |
| WE 03.02.2016 | No lecture |
| Mo 15.02.2016 | Written exam in HS 1 |
Literature
- Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006.
This book is available at the UP library and also, e.g., from Amazon.de. - Doan, Halevy, and Ives: Principles of data integration, Morgan Kaufmann, 2012.
- Özsu and Valduriez: Principles of distributed database systems, Springer, 2011.
- Stefan Conrad: Föderierte Datenbanksysteme, Springer, 1997.
Throughout the lecture I will refer to various scientific papers, that serve as in-depth references.
Exam
A written exam will take place on Feb. 15, 2015 at 13:00 in HS 1.