Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
 

Incremental Duplicate Detection (Sommersemester 2016)

Lecturer: Prof. Dr. Felix Naumann (Information Systems) , John Koumarelas (Information Systems)
Course Website: https://hpi.de/en/naumann/teaching/teaching/ss-16/incremental-duplicate-detection.html

General Information

  • Weekly Hours: 4
  • Credits: 6
  • Graded: yes
  • Enrolment Deadline:
  • Teaching Form: Seminar
  • Enrolment Type: Compulsory Elective Module
  • Maximum number of participants: 12

Programs, Module Groups & Modules

IT-Systems Engineering MA
  • IT-Systems Engineering A
  • IT-Systems Engineering B
  • IT-Systems Engineering C
  • IT-Systems Engineering D
IT-Systems Engineering BA

Description

Duplicates in datasets are multiple, different representations of same real-world object. Their detection is usually complex. Huge datasets and the online nature of current modern systems even demand for an incremental detection on new incoming data. In this seminar, we want to explore existing techniques for incremental duplicate detection, re-implement them, extend them, and evaluate them.

A naive approach of comparing a new record with all existent records would mean O(n) complexity, which in real-time systems is not feasible. Therefore the students who participate, will be provided with relevant literature that propose systems with advanced indexing techniques for handling this problem.

Requirements

Desired: Information Integration or Data Profiling and Data Cleansing course (we give higher priority if more than 12 students want to participate)

Learning

  • Introductory session
  • Individual meetings with advisors
  • Plenary meetings
  • Team-based software project (teams of 2)

Examination

  • Active participation
  • Short intermediate presentation (10min per team)
  • Long final presentation (30min per team)
  • Report (6 pages)
  • Implementation (efficiency, effectiveness, and extensions)

Dates

Please find the maintained schedule on the course page.

Zurück