Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Data Quality for AI (Wintersemester 2021/2022)

Lecturer: Prof. Dr. Felix Naumann (Information Systems) , Dr. Hazar Harmouch (Information Systems)
Course Website:

General Information

  • Weekly Hours: 4
  • Credits: 6
  • Graded: yes
  • Enrolment Deadline: 01.10.2021 - 22.10.2021
  • Teaching Form: Project / Seminar
  • Enrolment Type: Compulsory Elective Module
  • Course Language: English
  • Maximum number of participants: 6

Programs & Modules

IT-Systems Engineering MA
Data Engineering MA
  • DATA-Konzepte und Methoden
  • DATA-Techniken und Werkzeuge
  • DATA-Spezialisierung
  • PREP-Konzepte und Methoden
  • PREP-Techniken und Werkzeuge
  • PREP-Spezialisierung


Many AI methods are dependent on large quantities of suitable training data. This creates challenges not only concerning the availability of data but also regarding its quality. For example, incomplete, erroneous, inappropriate, or asymmetric training data leads to unreliable models and can ultimately lead to poor decisions, which is often referred to by Garbage in, garbage out (GIGO). The traditional definition of data or information quality includes dimensions, such as validity, accuracy, completeness, consistency, and uniformity. Nevertheless, this long-established definition of data quality does not yet consider modern AI systems and their requirements. Furthermore, there is not much research on the explainability of machine learning models in terms of the quality of the training/testing data. In this seminar, we will introduce you to the field of data quality and explore together the correlation between data quality and AI model performance.


For this seminar, participants need to be able to program fluently in Python and know how to use jupyter notebooks. The seminar also requires basic knowledge about machine learning algorithms.


  • Project seminar with weekly meetings
  • We plan the course to be on-site. However, we will switch to hybrid/online mode if the regulation changes.


  • Intermediate and final presentation
  • Demonstration and report of method implementation and its experimental results