Prof. Dr. Felix Naumann

Horrible Data and How to Clean it

With the advent of Big Data as a dominating theme of both research and industry, we have come to recognize the importance of good data quality. This recognition has sparked various new research projects and ideas, often based on fundamentals laid out in the previous decades. In the course of this seminar we will regard these fundamentals and how they are put to use in modern systems.

The seminar is based on the excellent overview article "Trends in Cleaning Relational Data: Consistency and Deduplication" by Ihab Ilyas and Xu Chu. Each student will cover one main section of this article and regard related work:

  1. Taxonomy of Anomaly Detection Techniques

    1. What to Detect
    2. How to Detect
    3. Where to Detect

  2. Taxonomy of Data Repairing Techniques

    1. What to Repair
    2. How to Repair
    3. Where to Repair


  • Extent: 2 SWS
  • Wednesdays 9:15 Uhr - 10:45 Uhr in Room G-3.E.11, Campus III FROM MAY 14th at Campus II, Building F, Room F-2.11
  • Maximal 6 participants
  • The first data serves as an introduction to the topic and the seminar. Subsequently, you can register for the course through an informal email by April 20 to felix.naumann@hpi.de. In case of more than six registrations, I will randomly pick slots.
  • Your tasks

    • Active participation at all meetings
    • At least three individual meeting to prepare presentations and assignment
    • One short (5+5 min) and one long (30+10 min) presentation
    • Short written summary of topic: 6 pages accoding to ACM template


Date Topic Presenter
11.4. Introduction: Data Quality and Data Cleansing Felix Naumann
18.4. %  
25.4. Individual meetings  
02.5. 6x short presentation All
09.5. Research organisation Felix Naumann
16.5. Individual meetings  
23.5. Individual meetings  
30.5. %  
06.6. 2x long presentation Leana Neuber, Sebastian Schmidl
13.6. 2x long presentation Jonas Keutel, Hendrik Folkerts
20.6. 1x long presentation Arne Herdick
27.6. Discussion of paper structures  
04.7. Individual meetings  
11.7. Individual meetings  
31.8. Abgabe der Ausarbeitung