Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Hazar Harmouch

Research Assistant , PhD Candidate

Infomation Systems Research Group 
Hasso-Plattner-Institut | Universität Potsdam
Prof.-Dr.-Helmert-Straße 2-3, D-14482 Potsdam

Contact Information

Research Interests

  • Web Tables similarity and Table joins
  • Data Profiling
  • Data Mining
  • Big Data

Projects

Teaching

Publications

Discovery of Genuine Functional Dependencies from Relational Data with Missing Values

Berti-Equille, Laure; Harmouch, Hazar; Naumann, Felix; Novelli, Noel; Thirumuruganathan, Saravanan in Proceedings of the VLDB Endowment (PVLDB) volume   11   of   11 , page 880-892 . 2018 .

Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
[ URL ] [ DOI ]
Discovery of Genuine Func... - Download
Further Information
Tags discovery  functional_dependencies  isg  missing  null  profiling