Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Publications (sorted in inverse chronological order)

Semi-Supervised Consensus Clustering: Reducing Human Effort

Vogel, Tobias; Naumann, Felix in Proceedings of the International Workshop on Data Integration and Applications 2014 .

Machine-based clustering yields fuzzy results. For example, when detecting duplicates in a dataset, different tools might end up with different clusterings. Eventually, a decision needs to be made, defining which records are in the same cluster, i. e., are duplicates. Such a definitive result is called a Consensus Clustering and can be created by evaluating the clustering attempts against each other and only resolving the disagreements by human experts. Yet, there can be different consensus clusterings, depending on the choice of disagreements presented to the human expert. In particular, they may require a different number of manual inspections. We present a set of strategies to select the smallest set of manual inspections to arrive at a consensus clustering and evaluate their efficiency on a set of real-world and synthetic datasets.
SemiSupervisedConsensusClustering.pdf
Further Information
Tags isg