Private Personal Information Discovery in Data

Analysing data to determine whether or not privacy violating information exists therein is an important activity in enabling privacy preserving data analytics, on the one hand, but on other hand, this is also important in making assertations about the risks of private data exposure in publicly shared datasets. Some of the simpler solutions include examining datasets for personal identifying information and transforming or eliminating the selected values to remove the risk of private information disclosure, examining user behaviour patterns (e.g. with movement and geolocation data), or identifying rare occurences of data values. Charactersing personal data in unstructured repositories is more difficult to define (characterise). Both of our current research projects AutoPII and Find-QID aim to address this issue. In the AutoPII project we are working on PII discovery algorithms, while the Find-QID project is focused on quasi-identifier discovery.

Current Projects

  • Find-QID: Quasi-Identifier Discovery in Large High Dimensional Data
  • AutoPII: Automated Personsal Identifying Information Discovery in Mesh Data