Dumas (Duplicate based Matching of Schemas)
Topic
This algorithm tries to find matches between two different tables with the help of duplicates.
Corresponding Papers
- Schema Matching Using Duplicates
(by Alexander Bilke, Felix Naumann)
Requirements
- Java 1.5 or higher
- 2 *.csv files
- The first line in every file has to hold the names of the corresponding column separated by semicolons.
- Separation-char: semicolon (";")
- Special columns has to hold following names:
- KeyCol (at least one is required): holds the primary key of the table
- RWOId (one or no column): this column holds the id for the real world objects
Command
- java -jar DumasOnFiles.jar <file1> <file2>
Sample files
Using the sample files listed below, you should get the following solution:
- Matchings:
- B -> B'
- E -> E'
- Unmatched sampleR.csv:
- A, C, D
- Unmatched sampleS.csv:
- F, G
Download
- jar archive: DumasOnFiles.jar
- Sample files: sampleR.csv sampleS.csv