Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Data Integration (Sommersemester 2024)

Lecturer: Prof. Dr. Felix Naumann (Information Systems) , Sebastian Schmidl (Information Systems)
Course Website: https://hpi.de/en/naumann/teaching/current-courses/ss-2024/data-integration.html

General Information

  • Weekly Hours: 4
  • Credits: 6
  • Graded: yes
  • Enrolment Deadline: 01.04.2024-30.04.2024
  • Examination time §9 (4) BAMA-O: 29.07.2024
  • Teaching Form: Lecture / Exercise
  • Enrolment Type: Compulsory Elective Module
  • Course Language: English

Programs, Module Groups & Modules

IT-Systems Engineering MA
Data Engineering MA
Software Systems Engineering MA
  • HPI-SSE-S Systems Foundations
  • DSYS: Data-Driven Systems
    • HPI-DSYS-C Concepts and Methods
  • DSYS: Data-Driven Systems
    • HPI-DSYS-T Technologies and Tools
  • DSYS: Data-Driven Systems
    • HPI-DSYS-S Specialization


Data integration is the merging of heterogeneous information from various data sources to a homogenous, clean dataset. Despite research and development over the past 40 years, collecting and integrating data from multiple sources remains an important and challenging task in any data-oriented or data science project. This lecture covers the basic technologies, such as distributed database architectures, techniques for virtual and materialized integration, data profiling, and data cleansing technologies. It thus combines the previous foundational lectures on information integration and data profiling to lay a foundation for handling unknown data.


  • Database knowledge (e.g. DBS I)


Lecture and exercises


Lecture grading is based 100% on the written exam (approx. 3h) after the end of the teaching period. Requirements for exam admission are:

  • "Passing" all four exercises
  • At least one short presentation of an exercise solution