Topics in Data Privacy (Wintersemester 2020/2021)
Lecturer:
Dr. Anne Kayem
(Internet-Technologien und -Systeme)
General Information
- Weekly Hours: 2
- Credits: 3
- Graded:
yes
- Enrolment Deadline: 01.10.-20.11.2020
- Teaching Form: Lecture / Exercise
- Enrolment Type: Compulsory Elective Module
- Course Language: English
- Maximum number of participants: 10
Programs, Module Groups & Modules
- Cybersecurity
- HPI-CS-PE Data Protection & Ethics
- DSEC: Data Security
- DSEC-Konzepte und Methoden
- DSEC: Data Security
- DSEC-Techniken und Werkzeuge
- DSEC: Data Security
- HDAS: Health Data Security
- HPI-HDAS-C Concepts and Methods
- HDAS: Health Data Security
- HPI-HDAS-T Technologies and Methods
- HDAS: Health Data Security
- HPI-HDAS-S Specialization
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-C Concepts and Methods
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-T Technologies and Tools
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-S Specialization
Description
The course is aimed at students with an interest in learning about methods of transforming data to protect against sensitive data exposure from publicly shared datasets. As such these methods are centered on techniques that transform the data to eliminate personal identication information, for data protection, while maintaining data integrity and consistency to support data processing and/or sharing operations.
In this course, we study some of the methods used to transform data in order to protect against sensitive data exposure. This is useful when such datasets are either shared publicly, or with multiple parties with diverse interests. Examples of such scenarios include privacy preserving data analytics / mining and privacy preserving machine learning. We will focus in particular on methods of identifying and eliminating personal identifying information from datasets, while maintaining data integrity and consistency.
Topics to be covered include: (1) methods of transforming data to enable privacy preserving data analytics; (2) weaknesses/vulnerabilities that enable sensitive data exposure; (3) record linkage, de-identication; and (4) outlier detection and analysis.
Legislation such as the General Data Protection Act (GDPR) imply that applications emerge in almost every field or service involving sensitive data processing, but perhaps the ones that are easy to relate to include data analytics in domains such as healthcare, cyber-physical systems (e.g. smart homes, ...), image processing, social media and online marketing platforms.
===
Syllabus
===
Block 1: Overview and Fundamentals
- Lecture 1: (03.11.2020) - Course Overview
- Lecture 2: (10.11.2020) - Data Pseudonymisation and Anonymisation
--------
Block 2: Preventing Data De-Anonymisation
- Lecture 3: (17.11.2020) - Probabilistic Models for Outlier Detection
- Lecture 4: (24.11.2020) - Proximity (or Similarity)-Based Outlier Detection
- Lecture 5: (01.12.2020) - High-Dimensional Outlier Detection
- Lecture 6: (08.12.2020) - Supervised Outlier Detection
- Lecture 7: (15.12.2020) - Project work discussions
--------
-- Christmas Break (21.12.2020 - 01.01.2021) --
--------
Block 3: Data Types for Analysis
- Lecture 8: (05.01.2021) - Categorical, Text, and Mixed Attribute Data
- Lecture 9: (12.01.2021) - Time Series and Streaming Outlier Detection
- Lecture 10: (19.01.2021) - Outlier Detection in Graphs and Networks
- Lecture 11: (26.01.2021) - Project work discussions
--------
Block 4: Project-work Presentations & Report
- Lecture 12: (02.02.2021) - Group Presentations I
- Lecture 13: (09.02.2021) - Group Presentations II
- Report Hand-in (15.03.2021)
--------
Requirements
While there are no strict prerequisites for this course, you might find it useful / helpful to have a background in IT Systems Engineering.
Literature
- Supporting reading material will be provided on a per-lecture basis.
- Some texts to consult include:
- Charu Aggrawal (2016) "Outlier Analysis" (Springer)
- Kishan G. Mehrotra, Chilukuri K. Mohan, HuaMing Huang (2018) "Anomaly Detection: Principles and Algorithms" (Springer)
Learning
- Understand the data privacy concepts, and definitions
- Learn to critically analyse data privacy algorithms and architectures in relation to data protection
- Learn to identify the advantages and disadvantages of privacy preserving algorithms in relation to potential de-anonymisation loopholes
- Aquire hands-on experience with re-identifying individuals from seemingly anonymous or innocent data
- Learn to develop and assess privacy protocols, algorithms, and anonymity protection schemes to prevent inferences in shared data.
Examination
Participants will work on a coursework project throughout the semester. Grading will be based on a presentations of the coursework project, as well as a report (5-6 pages or 2500-3000 words maximum) detailing the procedures (code, methods, design etc) used and the results obtained.
A selection of possible project topics will be presented during the first lecture (Tuesday, 03.11.2020 @ 13.30) . Choices maybe made from this selection, but propositions are also welcome provided these are discussed and agreed upon beforehand with the lecturer.
The report grade will count for 70% of the final score, while the presentation for 30%.
Rubric | Number | When? & Where? | Grade % |
Presentations | Groups (3 persons max) | 02.02 - 09.02. 2021 (Online - Zoom) | 30% |
Report | 1 (per person) | 15.03.2021 (Online Submission) | 70% |
Dates
Over the Winter Semester (02.11.2020 - 15.02.2021), Lectures will hold once a week as follows:
Weekday | Time Slot | Location |
Tuesday | 13.30 -15.00 | Online (Zoom) |
Lecture materials will be accessible on Moodle.
Note: To participate in the course you must be registered on the University of Potsdam's Moodle platform, and have registered to attend this course. Search for "Topics in Data Privacy" or "TDP" and enroll using "TDP-2020".
Zurück