Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Ioannis Koumarelas

I am a Ph.D. student at the Infomation Systems Research Group and my research started in collaboration with SAP and SAP Concur. Through my Ph.D. I have worked in the general area of Data Cleaning, Data Preparation, with my main focus on Duplicate Detection.

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Office: F-2.05, Campus II

Phone: +49 331 5509 1377
Email:  Ioannis Koumarelas (click)
Research: GoogleScholar, ResearchGate, DBLP
Profiles: LinkedIn, GitHub

Research Interests

  • Duplicate Detection (Record Linkage, Entity Resolution etc.), Data Cleaning, Data Preparation
  • Address Geocoding
  • Parallel and Distributed Systems, Big Data Management
  • Data Profiling
  • Data Mining, Machine Learning, Deep Learning

Projects

Cooperation project with SAP and SAP Concur, for Vendor Data Cleaning of hotels. Our main task has been to apply Duplicate Detection, thus identify duplicates and understand what are their causes. The approaches we followed mainly use data preparation and matching dependencies, for which more information is further available through our publications.

Publication list

Towards Progressive Search-driven Entity Resolution

Pietrangelo, Alberto; Simonini, Giovanni; Bergamaschi, Sonia; Naumann, Felix; Koumarelas, Ioannis in Italian Symposium on Advanced Database Systems (SEBD) 2018 .

Keyword-search systems for databases aim to answer a user query composed of a few terms with a ranked list of records. They are powerful and easy-to-use data exploration tools for a wide range of contexts. For instance, given a product database gathered scraping e-commerce websites, these systems enable even non-technical users to explore the item set (e.g., to check whether it contains certain products or not, or to discover the price of an item). However, if the database contains dirty records (i.e., incomplete and duplicated records), a preprocessing step to clean the data is required. One fundamental data cleaning step is Entity Resolution, i.e., the task of identifying and fusing together all the records that refer to the same real-word entity. This task is typically executed on the whole data, independently of: (i) the portion of the entities that a user may indicate through keywords, and (ii) the order priority that a user might express through an order by clause. This paper describes a first step to solve the problem of progressive search-driven Entity Resolution: resolving all the entities described by a user through a handful of keywords, progressively (according to an order by clause). We discuss the features of our method, named SearchER and showcase some examples of keyword queries on two real-world datasets obtained with a demonstrative prototype that we have built.
Weitere Informationen
Tagssys:relevantfor:isg  hpi  isg