Duplicate Detection (Wintersemester 2010/2011)
Dozent:
Prof. Dr. Felix Naumann
(Information Systems)
Allgemeine Information
- Semesterwochenstunden: 2
- ECTS: 3
- Benotet:
Ja
- Einschreibefrist: 1.10.2010 - 31.3.2011
- Lehrform: SP
- Belegungsart: Wahlpflichtmodul
- Maximale Teilnehmerzahl: 10
Studiengänge, Modulgruppen & Module
- IT-Systems Engineering A
- IT-Systems Engineering B
- IT-Systems Engineering C
- IT-Systems Engineering D
- Operating Systems & Information Systems Technology
Beschreibung
Duplicate detection is about finding multiple representatives of the same real-world entity within a datset. This task is difficult, because representations might differ slightly, so some similarity measure must be defined to compare pairs of records. Another difficulty is the the high volume, datasets might have, making a pair-wise comparison of all records infeasible.
In this seminar, we want to discuss several papers, covering different aspects of duplicate detecion.
Voraussetzungen
No requirements.
Literatur
- Felix Naumann & Melanie Herschel. An Introduction to Duplicate Detection. Synthesis Lectures on Data Management #3, 2010.
- Peter Christen and Karl Goiser. Quality and Complexity Measures for Data Linkage and Deduplication. Quality Measures in Data Mining, Volume 43, 2007.
Lern- und Lehrformen
Master's seminar for up to 10 students (no teams)
Leistungserfassung
- Active participation at all seminar dates
- At least 1 consultation each for presentation and written summary
- Presentation at the end of the semester
- Written summary and discussion of the paper (up to 8 pages) with latex template
Termine
- 19.10.2010: Seminar introduction and presentation of the topics
- 25.10.2010: Registration deadline (Email to Uwe Draisbach)
Please check regularly the seminar page for all other seminar dates.
Zurück