Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Ioannis Koumarelas

Ph.D. candidate student at the Infomation Systems Research Group at Hasso Plattner Institute in IT-Systems Engineering. 

Research Interests

High Interest:

  • Data Cleansing / Data Deduplication / Data Reconsiliation / Record Linkage / Data Matching / ...
  • Address Geocoding
  • Parallel and Distributed Systems
  • Big Data Management

Medium Interest: Machine Learning, Data Mining, Stream Processing, Web Mining, Information Retrieval

Low Interest: Information Theory, Data Science in general, Complexities in all previous and other., Virtualization

Projects

Cooperation project with SAP and Concur. Vendors Data Cleansing.

Master Thesis - Topics

If there is an intersection of your research interests with mine, then please contact me. You can find examples of master theses that I am willing to supervise here, or our official web page of list of theses.

Contact Information

Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Room: G-3.1.14

Phone: +49 331 5509 1377
Email:  Ioannis Koumarelas (click)

Publication list

Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection

Koumarelas, Ioannis; Kroschk, Axel; Mosley, Clifford; Naumann, Felix in J. Data and Information Quality 2018 .

Given a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario, a database record is completely identical to the query. However, in most cases, problems do arise, for instance, as a result of data errors or data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values, and similarity comparisons, each requiring some domain knowledge. In this article, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process, we propose a domain-specific procedure to, first, enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding and, second, to select the best similarity measure per each address attribute that will finally help the classifier to achieve the best f-measure. We report on our experience in selecting geocoding services and discovering similarity measures for a concrete but common industry use-case.
[ URL ]
Experience: Enhancing Add... - Download
Further Information
Tags sys:relevantfor:isg  hpi  isg