Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Data Quality for AI (Wintersemester 2021/2022)

Dozent: Prof. Dr. Felix Naumann (Information Systems) , Dr. Hazar Harmouch (Information Systems)
Website zum Kurs: https://hpi.de/naumann/teaching/current-courses/ws-21-22/data-quality-for-ai.html

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2021 - 22.10.2021
  • Lehrform: Projekt / Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 6

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
Data Engineering MA


Many AI methods are dependent on large quantities of suitable training data. This creates challenges not only concerning the availability of data but also regarding its quality. For example, incomplete, erroneous, inappropriate, or asymmetric training data leads to unreliable models and can ultimately lead to poor decisions, which is often referred to by Garbage in, garbage out (GIGO). The traditional definition of data or information quality includes dimensions, such as validity, accuracy, completeness, consistency, and uniformity. Nevertheless, this long-established definition of data quality does not yet consider modern AI systems and their requirements. Furthermore, there is not much research on the explainability of machine learning models in terms of the quality of the training/testing data. In this seminar, we will introduce you to the field of data quality and explore together the correlation between data quality and AI model performance.


For this seminar, participants need to be able to program fluently in Python and know how to use jupyter notebooks. The seminar also requires basic knowledge about machine learning algorithms.

Lern- und Lehrformen

  • Project seminar with weekly meetings
  • We plan the course to be on-site. However, we will switch to hybrid/online mode if the regulation changes.


  • Intermediate and final presentation
  • Demonstration and report of method implementation and its experimental results