Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Topic

This algorithm tries to find matches between two different tables with the help of duplicates.

Corresponding Papers

Requirements

  • Java 1.5 or higher
  • 2 *.csv files
  • The first line in every file has to hold the names of the corresponding column separated by semicolons.
  • Separation-char: semicolon (";")
  • Special columns has to hold following names:
    • KeyCol (at least one is required): holds the primary key of the table
    • RWOId (one or no column): this column holds the id for the real world objects

Command

  • java -jar DumasOnFiles.jar <file1> <file2>

Sample files

Using the sample files listed below, you should get the following solution:

  • Matchings:
    • B -> B'
    • E -> E'
  • Unmatched sampleR.csv:
    • A, C, D
  • Unmatched sampleS.csv:
    • F, G

Download