Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Explainable Data Matching

Prof. Dr. Felix Naumann

Description

Data matching is the process of detecting (and subsequently cleaning) multiple representations of the same real-world object within a given dataset. Typical approaches create a candidate set of record pairs, determine their similarity, and then compare it to some threshold. Such data matching systems and their components can be quite complex, and understanding their results is difficult. Building upon the data matching benchmark platform Frost and its implementation Snowman (pdf, github), we plan to develop methods to better explain data matching results to developers and domain experts.

These explanations could be in the form of carefully selected record pairs, a visualization of value similarities, an analysis of dependencies between certain values and misclassification of their records, etc. We will design, implement and test such novel methods, ideally resulting in a submission to a scientific conference.

Time Table

We meet Tuesdays at 17:00 in F.2-10. The first meeting is open to all. I expect a binding registration to me via email by April 29, after which I will notify the participants. In case of more participants than slots, I will randomly select students.

Date

Topic

25.04.2022Introduction to data matching and topic selection
02.05.2022Kickoff, introductions and scheduling
10.05.2022 (online)First insights into research avenues
17.05.2022Brief presentations of related work 
24.05.2022Presentations of solution ideas
31.05.2022Guest talk: Andrea Baraldi (U Modena) on Landmark Explanations
07.06.2022Report on team deep-dives
14.06.2022Student-internal meeting (discuss evaluation methods)
21.06.2022Intermediate presentations (15min each)
28.06.2022Status updates
05.07.2022Status updates
12.07.2022 
19.07.2022 
26.07.2022Final Presentations

Final report submission deadline: August 26, 2022

Literature