Horrible Data (SE, Master)
Horrible Data and How to Clean it
With the advent of Big Data as a dominating theme of both research and industry, we have come to recognize the importance of good data quality. This recognition has sparked various new research projects and ideas, often based on fundamentals laid out in the previous decades. In the course of this seminar we will regard these fundamentals and how they are put to use in modern systems.
The seminar is based on the excellent overview article "Trends in Cleaning Relational Data: Consistency and Deduplication" by Ihab Ilyas and Xu Chu. Each student will cover one main section of this article and regard related work:
- Taxonomy of Anomaly Detection Techniques
- What to Detect
- How to Detect
- Where to Detect
- Taxonomy of Data Repairing Techniques
- What to Repair
- How to Repair
- Where to Repair
Organisation
- Extent: 2 SWS
- Wednesdays 9:15 Uhr - 10:45 Uhr in Room G-3.E.11, Campus III FROM MAY 14th at Campus II, Building F, Room F-2.11
- Maximal 6 participants
- The first data serves as an introduction to the topic and the seminar. Subsequently, you can register for the course through an informal email by April 20 to felix.naumann@hpi.de. In case of more than six registrations, I will randomly pick slots.
- Your tasks
- Active participation at all meetings
- At least three individual meeting to prepare presentations and assignment
- One short (5+5 min) and one long (30+10 min) presentation
- Short written summary of topic: 6 pages accoding to ACM template
Schedule
| Date | Topic | Presenter |
|---|---|---|
| 11.4. | Introduction: Data Quality and Data Cleansing | Felix Naumann |
| 18.4. | % | |
| 25.4. | Individual meetings | |
| 02.5. | 6x short presentation | All |
| 09.5. | Research organisation | Felix Naumann |
| 16.5. | Individual meetings | |
| 23.5. | Individual meetings | |
| 30.5. | % | |
| 06.6. | 2x long presentation | Leana Neuber, Sebastian Schmidl |
| 13.6. | 2x long presentation | Jonas Keutel, Hendrik Folkerts |
| 20.6. | 1x long presentation | Arne Herdick |
| 27.6. | Discussion of paper structures | |
| 04.7. | Individual meetings | |
| 11.7. | Individual meetings | |
| 31.8. | Abgabe der Ausarbeitung |