Information Integration
Information integration is the merging of heterogeneous information from various data sources to a homogenous, clean dataset. This lecture introduces this ever-important topic. It will cover the basic technologies, such as distributed database architectures, techniques for virtual and materialized integration, and data cleansing technologies.
Further Information:
- Lectures can be given in English, on demand.
- Slides will be made available on the HPI-internal materials-folder.
- The lectures will be recorded by tele-task.
- The exercises are led by Tim Repke.
Schedule
The course will take place Mondays in F-E.06 and Thursdays in H-2.57 at 09:15 AM. Some lectures will have the form of exercises.
To download lecture slides, please click links below.
| Date | Topic |
|---|---|
| MO 14.10. | No lecture - HPI Plenary Meeting |
| TH 17.10. | Exercise: Organization and Task Introduction |
| MO 21.10. | Introduction |
| TH 24.10. | Distribution, Autonomy, and Heterogeneity |
| MO 28.10. in F-E.06 | Exercise: Extracted Datasets |
| Reformation Day | -- |
| MO 04.11. | Distribution, Autonomy, and Heterogeneity |
| TH 07.11. | Materialized and Virtual Integration |
| MO 11.11. | Exercise: Project Task Definition (analytics question) |
| TH 14.11. in F-E.06 | Web Table Research (Hazar Harmouch) Architectures (Felix Naumann) |
| MO 18.11. | no lecture |
| TH 21.11. | no lecture |
| MO 25.11. | Architectures & SchemaSQL |
| TH 28.11. | SchemaSQL |
| MO 02.12. | Schema Matching |
| TH 05.12. | Exercise: Data Integration (schema matching, transformation, normalization) |
| MO 09.12. | Schema Mapping |
| TH 12.12. | Global-as-View |
| MO 16.12. | Local-as-View |
| TH 19.12. | Local-as-View |
| Christmas break | -- |
| MO 06.01. | Bucket Algorithm (Dr. Armin Roth, Universität Tübingen) |
| TH 09.01. | Exercise: Data Cleansing (duplicate detection, linkage, data fusion) |
| MO 13.01. | Duplicate Detection |
| TH 16.01. | Duplicate Detection |
| MO 20.01. | Duplicate Detection |
| TH 23.01. | Information Quality |
| MO 27.01. | no lecture |
| TH 30.01. | Scalable Data Cleansing (Dr. Jorge Quiane-Ruiz, TU Berlin) |
| MO 03.02. | Exercise: Analytics (visualizations, etc to answer the initial question) |
| TH 06.02. | Exam Preparation |
| Feb. 11 and 12 | Oral exams |
Office Hours
If you have any questions relating the lecture or exercise, feel free to contact Tim Repke or come by during the office hours:
Every Monday, 14:00 - 15:00
Room F-2.07
Exceptions:
- 09.11. (any other day this week per request)
- 23.12. - 05.01.
Literature & Exam
- Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006 (free pdf).
This book is available at the UP library and also, e.g., from Amazon.de. - Doan, Halevy, and Ives: Principles of data integration, Morgan Kaufmann, 2012.
- Özsu and Valduriez: Principles of distributed database systems, Springer, 2011.
- Stefan Conrad: Föderierte Datenbanksysteme, Springer, 1997.
Throughout the lecture, I will refer to various scientific papers, that serve as in-depth references.
Oral exams (30min) will take place on February 11th and 12th 2020, please contact Diana Stephan and check the doodle regarding the schedule.