Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Horrible Data and How to Clean it

With the advent of Big Data as a dominating theme of both research and industry, we have come to recognize the importance of good data quality. This recognition has sparked various new research projects and ideas, often based on fundamentals laid out in the previous decades. In the course of this seminar we will regard these fundamentals and how they are put to use in modern systems.

The seminar is based on the excellent overview article "Trends in Cleaning Relational Data: Consistency and Deduplication" by Ihab Ilyas and Xu Chu. Each student will cover one main section of this article and regard related work:

  1. Taxonomy of Anomaly Detection Techniques
    1. What to Detect
    2. How to Detect
    3. Where to Detect
  2. Taxonomy of Data Repairing Techniques
    1. What to Repair
    2. How to Repair
    3. Where to Repair

Organisation

  • Extent: 2 SWS
  • Wednesdays 9:15 Uhr - 10:45 Uhr in Room G-3.E.11, Campus III FROM MAY 14th at Campus II, Building F, Room F-2.11
  • Maximal 6 participants
  • The first data serves as an introduction to the topic and the seminar. Subsequently, you can register for the course through an informal email by April 20 to felix.naumann@hpi.de. In case of more than six registrations, I will randomly pick slots.
  • Your tasks
    • Active participation at all meetings
    • At least three individual meeting to prepare presentations and assignment
    • One short (5+5 min) and one long (30+10 min) presentation
    • Short written summary of topic: 6 pages accoding to ACM template

Schedule

DateTopicPresenter
11.4.Introduction: Data Quality and Data CleansingFelix Naumann
18.4.% 
25.4.Individual meetings 
02.5.6x short presentationAll
09.5.Research organisationFelix Naumann
16.5.Individual meetings 
23.5.Individual meetings 
30.5.% 
06.6.2x long presentationLeana Neuber, Sebastian Schmidl
13.6.2x long presentationJonas Keutel, Hendrik Folkerts
20.6.1x long presentationArne Herdick
27.6.Discussion of paper structures 
04.7.Individual meetings 
11.7.Individual meetings 
31.8.Abgabe der Ausarbeitung