With the advent of Big Data as a dominating theme of both research and industry, we have come to recognize the importance of good data quality. This recognition has sparked various new research projects and ideas, often based on fundamentals laid out in the previous decades. In the course of this seminar we will regard these fundamentals and how they are put to use in modern systems.
The seminar is based on the excellent overview article "Trends in Cleaning Relational Data: Consistency and Deduplication" by Ihab Ilyas and Xu Chu. Each student will cover one main section of this article and regard related work:
- Taxonomy of Anomaly Detection Techniques
- What to Detect
- How to Detect
- Where to Detect
- Taxonomy of Data Repairing Techniques
- What to Repair
- How to Repair
- Where to Repair