Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Dr. Ziawasch Abedjan

 

former member

 


 

 

 

 

Please visit my new page at MIT CSAIL for the most recent news.

Research Activities

Topics

  • Data Profiling
  • Data Mining

Projects

Master's Thesis Co-Supervision

  • Benjamin Emde: "Context-Aware Recommendations in Social Networks", 2012
  • Sven Viehmeier: "Incremental Data profiling", 2012/2013
  • Patrick Schulze: "Depth-First Discovery of Functional Dependencies" 2013/2014

Master Project Co-Supervision

  • "Global Relevance Scores for DBpedia Facts", 2012/2013

Teaching

Publications

A Hybrid Approach for Efficient Unique Column Combination Discovery

Papenbrock, Thorsten; Naumann, Felix in Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW) BTW , 2017 .

Unique column combinations (UCCs) are groups of attributes in relational datasets that contain no value-entry more than once. Hence, they indicate keys and serve data management tasks, such as schema normalization, data integration, and data cleansing. Because the unique column combinations of a particular dataset are usually unknown, UCC discovery algorithms have been proposed to find them. All previous such discovery algorithms are, however, inapplicable to datasets of typical real-world size, e.g., datasets with more than 50 attributes and a million records. We present the hybrid discovery algorithm HyUCC, which uses the same discovery techniques as the recently proposed functional dependency discovery algorithm HyFD: A hybrid combination of fast approximation techniques and efficient validation techniques. With it, the algorithm discovers all minimal unique column combinations in a given dataset. HyUCC does not only outperform all existing approaches, it also scales to much larger datasets.
paper.pdf
Further Information
Tags hpi hyucc isg profiling unique_column_combinations
BibTeX

Review Activity

  • ACM Transactions on the Web
  • DESWeb 2014