Prof. Dr. Felix Naumann


      Dataset 1

      This dataset includes 9763 CDs randomly extracted from freeDB.

      • Dataset
        The data was converted from plain to XML and is packed into a zip archive.
      • Duplicates (298 objects)
        A list of all duplicates in the dataset.
      • Duplicates (new in 2018) (299 objects)
        This is an updated list - we had missed a ransitive duplicate pair.
      • Schema of the dataset
        This is a pdf representation of the schema of the dataset.

      Dataset 2

      This dataset was generated by extracting 500 clean CD objects from the FreeDB database and 500 artificially generated duplicates using the Dirty XML Data Generator (one duplicate for each CD).