Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Topic

This algorithm tries to find matches between two different tables with the help of duplicates.

Corresponding Papers

Requirements

  • Java 1.5 or higher
  • 2 *.csv files

  • The first line in every file has to hold the names of the corresponding column separated by semicolons.
  • Separation-char: semicolon (";")
  • Special columns has to hold following names:

    • KeyCol (at least one is required): holds the primary key of the table
    • RWOId (one or no column): this column holds the id for the real world objects

Command

  • java -jar DumasOnFiles.jar <file1> <file2>

Sample files

Using the sample files listed below, you should get the following solution:

  • Matchings:

    • B -> B'
    • E -> E'

  • Unmatched sampleR.csv:

    • A, C, D

  • Unmatched sampleS.csv:

    • F, G

Download