Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Topics in Data Privacy (Wintersemester 2020/2021)

Dozent: Dr. Anne Kayem (Internet-Technologien und -Systeme)

Allgemeine Information

  • Semesterwochenstunden: 2
  • ECTS: 3
  • Benotet: Ja
  • Einschreibefrist: 01.10.-20.11.2020
  • Lehrform: Vorlesung / Übung
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 10

Studiengänge & Module

Cybersecurity MA
Data Engineering MA
Digital Health MA


The course is aimed at students with an interest in learning about methods of transforming data to protect against sensitive data exposure from publicly shared datasets. As such these methods are centered on techniques that transform the data to eliminate personal identication information, for data protection, while maintaining data integrity and consistency to support data processing and/or sharing operations.

In this course, we study some of the methods used to transform data in order to protect against sensitive data exposure. This is useful when such datasets are either shared publicly, or with multiple parties with diverse interests. Examples of such scenarios include privacy preserving data analytics / mining and privacy preserving machine learning. We will focus in particular on methods of identifying and eliminating personal identifying information from datasets, while maintaining data integrity and consistency.

Topics to be covered include: (1) methods of transforming data to enable privacy preserving data analytics; (2) weaknesses/vulnerabilities that enable sensitive data exposure; (3) record linkage, de-identication; and (4) outlier detection and analysis.

Legislation such as the General Data Protection Act (GDPR) imply that applications emerge in almost every field or service involving sensitive data processing, but perhaps the ones that are easy to relate to include data analytics in domains such as healthcare, cyber-physical systems (e.g. smart homes, ...), image processing, social media and online marketing platforms.




Block 1: Overview and Fundamentals

  • Lecture 1: (03.11.2020) - Course Overview
  • Lecture 2: (10.11.2020) - Data Pseudonymisation and Anonymisation


Block 2: Preventing Data De-Anonymisation

  • Lecture 3: (17.11.2020) - Probabilistic Models for Outlier Detection
  • Lecture 4: (24.11.2020) - Proximity (or Similarity)-Based Outlier Detection
  • Lecture 5: (01.12.2020) - High-Dimensional Outlier Detection
  • Lecture 6: (08.12.2020) - Supervised Outlier Detection
  • Lecture 7: (15.12.2020) - Project work discussions


-- Christmas Break (21.12.2020 - 01.01.2021) --


Block 3: Data Types for Analysis

  • Lecture 8: (05.01.2021) - Categorical, Text, and Mixed Attribute Data
  • Lecture 9: (12.01.2021) - Time Series and Streaming Outlier Detection
  • Lecture 10: (19.01.2021) - Outlier Detection in Graphs and Networks
  • Lecture 11: (26.01.2021) - Project work discussions


Block 4: Project-work Presentations & Report

  • Lecture 12: (02.02.2021) - Group Presentations I
  • Lecture 13: (09.02.2021) - Group Presentations II
  • Report Hand-in (15.03.2021)



While there are no strict prerequisites for this course, you might find it useful / helpful to have a background in IT Systems Engineering.


  • Supporting reading material will be provided on a per-lecture basis.
  • Some texts to consult include:
    • Charu Aggrawal (2016) "Outlier Analysis" (Springer)
    • Kishan G. Mehrotra, Chilukuri K. Mohan, HuaMing Huang (2018) "Anomaly Detection: Principles and Algorithms" (Springer)

Lern- und Lehrformen

  • Understand the data privacy concepts, and definitions
  • Learn to critically analyse data privacy algorithms and architectures in relation to data protection
  • Learn to identify the advantages and disadvantages of privacy preserving algorithms in relation to potential de-anonymisation loopholes
  • Aquire hands-on experience with re-identifying individuals from seemingly anonymous or innocent data
  • Learn to develop and assess privacy protocols, algorithms, and anonymity protection schemes to prevent inferences in shared data.


Participants will work on a coursework project throughout the semester. Grading will be based on a presentations of the coursework project, as well as a report (5-6 pages or 2500-3000 words maximum) detailing the procedures (code, methods, design etc) used and the results obtained.

A selection of possible project topics will be presented during the first lecture (Tuesday, 03.11.2020 @ 13.30) . Choices maybe made from this selection, but propositions are also welcome provided these are discussed and agreed upon beforehand with the lecturer.

The report grade will count for 70% of the final score, while the presentation for 30%.


Rubric Number When? & Where? Grade %
Presentations Groups (3 persons max) 02.02 - 09.02. 2021 (Online - Zoom) 30%
Report 1 (per person) 15.03.2021 (Online Submission) 70%


Over the Winter Semester (02.11.2020 - 15.02.2021), Lectures will hold once a week as follows:

Weekday Time Slot Location
Tuesday 13.30 -15.00 Online (Zoom)


Lecture materials will be accessible on Moodle.

Note: To participate in the course you must be registered on the University of Potsdam's Moodle platform, and have registered to attend this course. Search for "Topics in Data Privacy" or "TDP" and enroll using "TDP-2020".