Find-QID: Quasi-Identifier Discovery in large High Dimensional Data

This project focuses on privacy in large high-dimensional datasets as an example of a special case of data that is characteristically challenging to anonymise using standard syntactic methods. We study the problem from the view point ot effectively searching for and eliminating all the quasi-identifiers (QIDs) present in a high-dimensional dataset, as a means of discovering all privacy-violating elements in a high-dimensional dataset. Application areas for this work include large genome data, health data, online shopping data, and data meshes. Code and data can also be found here (genome data) and here (personalized education).

Algorithms, Code, and Data

Algorithms, Code, and Datasets:

Attribute Compartmentation
Semi-Synthetic Genome Data
Personalised Education Data
Parallel QID Discovery

Research

Active:

Anne Kayem (Project Lead)
Nikolai J. Podlesny (QID Discovery Algorithms)