Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Privacy Preserving Outlier Detection (Wintersemester 2022/2023)

Dozent: Dr. Anne Kayem (Internet-Technologien und -Systeme)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.2022 - 31.10.2022
  • Prüfungszeitpunkt §9 (4) BAMA-O: 15.12.2022
  • Lehrform: Seminar / Übung
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 50

Studiengänge, Modulgruppen & Module

Data Engineering MA
Digital Health MA
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-C Concepts and Methods
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-T Technologies and Tools
  • SCAD: Scalable Computing and Algorithms for Digital Health
    • HPI-SCAD-S Specialization
  • HDAS: Health Data Security
    • HPI-HDAS-C Concepts and Methods
  • HDAS: Health Data Security
    • HPI-HDAS-T Technologies and Methods
  • HDAS: Health Data Security
    • HPI-HDAS-S Specialization
Cybersecurity MA
Software Systems Engineering MA


In an increasingly interconnected world in which almost every device is essentially both a data generator and collector, composing large datasets of complex personal information is ever more easy to achieve. This is in spite of the fact that privacy legislation such as GDPR provides measures to prohibit the collection and storage of personal data without explicit user consent. A further point of alarm is the growing number of reports in popular media on de-anonymization incidents that have paved the way for related security subversion incidents such as leaks of personal data.

In this seminar, we study several anonymized datasets in effort to understand why and how de-anonymizations occur. Specifically, we focus on designing reverse-clustering algorithms to discover outlier data points, and determine how these can be used either individually or in combination with auxiliary data, to de-anonymize data points within the original dataset. As a final point, we will discuss the properties of the outlier data points in terms of how they enabled the de-anonymizations and what possible counter-measures to apply.


Some topics to be discussed will include the following:

  • Outlier Detection
  • Distance-Based Outlier Detection
  • Clustering-Based Outlier Detection Approaches
  • Model-Based Outlier Detection Approaches
  • Algorithms for Outlier Detection
  • ...


There are no pre-requisites for this course, however a background in data mining or machine learning might be helpful.


Reference materials will be provided on a per-lecture and per-need basis.

Lern- und Lehrformen

At the end of this seminar you should have some insight into the research field of outlier detction (or also sometime refered to as anomaly detection) algorithms for supporting the generation of privacy preserving datasets. You will also have studied the conceptual foundations of these algorithms and through the project work applied these learnings to some examples of datasets drawn from real-life practical application areas (e.g. textual data, geo-location data, images, ...).


Grading for the seminar will be based on a mid-semester presentation (20%), a final presentation (30%) and a report (50%). The table below provides a summary:

  Number When Grade
Mid Semester Presentation Group Size (2-3) 15.12.2022 20%
Final Semester Presentation Group Size (2-3) 09.02.2023 30%
Final Report One (1) per Group 03.03.2023 50%


The first lecture will hold on Wednesday, October 19, 2022.

For the duration of the semester, lectures and project meetings will hold as follows: 

  • Wednesdays, 13.30 - 15.00 - Lecture (in A.1.1)
  • Thursdays, 09.15 - 10.45 - Project Meetings (in A1.1 or Online Zoom)

Lecture materials, and annoncements will be accessible via the HPI Moodle Platform.