Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Amazon-Walmart dataset

Contains product information along the Amazon and Walmart product catalogues. Obtained from here and has been processed into the following:

  • Dataset
    • We have merged the two relations into a single file to use it for deduplication. A simple data preparation of lower-casing and removing of special characters has been applied. Available in tab separated value (TSV) format. (24,583 objects - TSV format)
  • Duplicates
  • Non-duplicates