Horrible Data and How to Clean it

With the advent of Big Data as a dominating theme of both research and industry, we have come to recognize the importance of good data quality. This recognition has sparked various new research projects and ideas, often based on fundamentals laid out in the previous decades. In the course of this seminar we will regard these fundamentals and how they are put to use in modern systems.

The seminar is based on the excellent overview article "Trends in Cleaning Relational Data: Consistency and Deduplication" by Ihab Ilyas and Xu Chu. Each student will cover one main section of this article and regard related work:

Taxonomy of Anomaly Detection Techniques
1. What to Detect
2. How to Detect
3. Where to Detect
Taxonomy of Data Repairing Techniques
1. What to Repair
2. How to Repair
3. Where to Repair

Organisation

Extent: 2 SWS
Wednesdays 9:15 Uhr - 10:45 Uhr in Room G-3.E.11, Campus III FROM MAY 14th at Campus II, Building F, Room F-2.11
Maximal 6 participants
The first data serves as an introduction to the topic and the seminar. Subsequently, you can register for the course through an informal email by April 20 to felix.naumann@hpi.de. In case of more than six registrations, I will randomly pick slots.
Your tasks
- Active participation at all meetings
- At least three individual meeting to prepare presentations and assignment
- One short (5+5 min) and one long (30+10 min) presentation
- Short written summary of topic: 6 pages accoding to ACM template

Schedule

Date	Topic	Presenter
11.4.	Introduction: Data Quality and Data Cleansing	Felix Naumann
18.4.	%
25.4.	Individual meetings
02.5.	6x short presentation	All
09.5.	Research organisation	Felix Naumann
16.5.	Individual meetings
23.5.	Individual meetings
30.5.	%
06.6.	2x long presentation	Leana Neuber, Sebastian Schmidl
13.6.	2x long presentation	Jonas Keutel, Hendrik Folkerts
20.6.	1x long presentation	Arne Herdick
27.6.	Discussion of paper structures
04.7.	Individual meetings
11.7.	Individual meetings
31.8.	Abgabe der Ausarbeitung