Duplicate Detection (Wintersemester 2010/2011)
Lecturer: Prof. Dr. Felix Naumann
- Weekly Hours: 2
- Credits: 3
- Enrolment Deadline: 1.10.2010 - 31.3.2011
- Teaching Form: SP
- Enrolment Type: Compulsory Elective Module
- Maximum number of participants: 10
Programs & Modules
- IT-Systems Engineering A
- IT-Systems Engineering B
- IT-Systems Engineering C
- IT-Systems Engineering D
Duplicate detection is about finding multiple representatives of the same real-world entity within a datset. This task is difficult, because representations might differ slightly, so some similarity measure must be defined to compare pairs of records. Another difficulty is the the high volume, datasets might have, making a pair-wise comparison of all records infeasible.
In this seminar, we want to discuss several papers, covering different aspects of duplicate detecion.
- Felix Naumann & Melanie Herschel. An Introduction to Duplicate Detection. Synthesis Lectures on Data Management #3, 2010.
- Peter Christen and Karl Goiser. Quality and Complexity Measures for Data Linkage and Deduplication. Quality Measures in Data Mining, Volume 43, 2007.
Master's seminar for up to 10 students (no teams)
- Active participation at all seminar dates
- At least 1 consultation each for presentation and written summary
- Presentation at the end of the semester
- Written summary and discussion of the paper (up to 8 pages) with latex template
- 19.10.2010: Seminar introduction and presentation of the topics
- 25.10.2010: Registration deadline (Email to Uwe Draisbach)
Please check regularly the seminar page for all other seminar dates.