Outlier Detection for Privacy Preserving Data Analytics (OuTTexT)

In this project, we consider outlier detection, as unsupervised anomaly detection approach to discovering abnormal or unusual observations in datasets which can potentially be exploited to discover private information. We study the problems that arise in discovering personal information in various representations of data specifically textual and image data which presents a rich source of information. Such datasets are often very large and present high feature dimensionality, making it necessary to develop accurate and highly efficient outlier detection models/algorithms. In order to identify anomalies in text successfully, we investigate and evaluate stylistic and linguistic features used to characterize textual data. Clustering algorithms are then used to measure similarity measure and data points that are distinct from the clusters are evaluated as potentially anomalous (privacy violating). 

Algorithms, Code, and Researchers

Code Repository:


  • Active:
    • Anne Kayem (Project Lead)
    • Md. Hasan Shahriar (PII Discovery)
  • Past:
    • Md. Rakibul Islam (Outlier Detection)