Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Ioannis Koumarelas

Ph.D. candidate student at the Infomation Systems Research Group at Hasso Plattner Institute in IT-Systems Engineering. 

Research Interests

High Interest:

  • Data Cleansing / Data Deduplication / Data Reconsiliation / Record Linkage / Data Matching / ...
  • Address Geocoding
  • Parallel and Distributed Systems
  • Big Data Management

Medium Interest: Machine Learning, Data Mining, Stream Processing, Web Mining, Information Retrieval

Low Interest: Information Theory, Data Science in general, Complexities in all previous and other., Virtualization

Projects

Cooperation project with SAP and Concur. Vendors Data Cleansing.

Master Thesis - Topics

If there is an intersection of your research interests with mine, then please contact me. You can find examples of master theses that I am willing to supervise here, or our official web page of list of theses.

Contact Information

Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Room: G-3.1.14

Phone: +49 331 5509 1377
Email:  Ioannis Koumarelas (click)

Publication list

Towards Progressive Search-driven Entity Resolution

Pietrangelo, Alberto; Simonini, Giovanni; Bergamaschi, Sonia; Naumann, Felix; Koumarelas, Ioannis in SEBD 2018 .

Keyword-search systems for databases aim to answer a user query composed of a few terms with a ranked list of records. They are powerful and easy-to-use data exploration tools for a wide range of contexts. For instance, given a product database gathered scraping e-commerce websites, these systems enable even non-technical users to explore the item set (e.g., to check whether it contains certain products or not, or to discover the price of an item). However, if the database contains dirty records (i.e., incomplete and duplicated records), a preprocessing step to clean the data is required. One fundamental data cleaning step is Entity Resolution, i.e., the task of identifying and fusing together all the records that refer to the same real-word entity. This task is typically executed on the whole data, independently of: (i) the portion of the entities that a user may indicate through keywords, and (ii) the order priority that a user might express through an order by clause. This paper describes a first step to solve the problem of progressive search-driven Entity Resolution: resolving all the entities described by a user through a handful of keywords, progressively (according to an order by clause). We discuss the features of our method, named SearchER and showcase some examples of keyword queries on two real-world datasets obtained with a demonstrative prototype that we have built.
[ URL ]
Towards Progressive Searc... - Download
Further Information
Tags sys:relevantfor:isg  hpi  isg