Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

Duplicate Detection (Wintersemester 2010/2011)

Lecturer: Prof. Dr. Felix Naumann (Information Systems)

General Information

  • Weekly Hours: 2
  • Credits: 3
  • Graded: yes
  • Enrolment Deadline: 1.10.2010 - 31.3.2011
  • Teaching Form: SP
  • Enrolment Type: Compulsory Elective Module
  • Maximum number of participants: 10

Programs, Module Groups & Modules

IT-Systems Engineering MA
  • IT-Systems Engineering A
  • IT-Systems Engineering B
  • IT-Systems Engineering C
  • IT-Systems Engineering D
IT-Systems Engineering BA

Description

Duplicate detection is about finding multiple representatives of the same real-world entity within a datset. This task is difficult, because representations might differ slightly, so some similarity measure must be defined to compare pairs of records. Another difficulty is the the high volume, datasets might have, making a pair-wise comparison of all records infeasible.

In this seminar, we want to discuss several papers, covering different aspects of duplicate detecion.

Requirements

No requirements.

Literature

  • Felix Naumann & Melanie Herschel. An Introduction to Duplicate Detection. Synthesis Lectures on Data Management #3, 2010.
  • Peter Christen and Karl Goiser. Quality and Complexity Measures for Data Linkage and Deduplication. Quality Measures in Data Mining, Volume 43, 2007.

Learning

Master's seminar for up to 10 students (no teams)

Examination

  • Active participation at all seminar dates
  • At least 1 consultation each for presentation and written summary
  • Presentation at the end of the semester
  • Written summary and discussion of the paper (up to 8 pages) with latex template

Dates

  • 19.10.2010: Seminar introduction and presentation of the topics
  • 25.10.2010: Registration deadline (Email to Uwe Draisbach)

Please check regularly the seminar page for all other seminar dates.

Zurück