Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Content

      Dataset 1

      This dataset includes 9763 CDs randomly extracted from freeDB.

      • Dataset
        The data was converted from plain to XML and is packed into a zip archive.
      • Duplicates (298 objects)
        A list of all duplicates in the dataset.

      • Schema of the dataset
        Here you get the schema of the dataset provided in a pdf file.

      Dataset 2

      This dataset was generated by extracting 500 clean CD objects from the FreeDB database and 500 artificially generated duplicates using the Dirty XML Data Generator (one duplicate for each CD).