Scalable Inclusion Dependency Discovery

Shaabani, Nuhad; Meinel, Christoph in Proceedings of the 20th International Conference of Database Systems for Advanced Applications (DASFAA2015) Volume   9049   von   Lecture Notes in Computer Science 9049 , Seite 425-440 . Springer , 2015 .

Inclusion dependencies within and across databases are an important relationship for many applications in anomaly detection, schema (re-)design, query optimization or data integration. When such dependencies are not available as explicit metadata, scalable and efficient algorithms have to discover them from a given data instance. We introduce a new idea for clustering the attributes of database relations. Based on this idea we have developed S-indd, an efficient and scalable algorithm for discovering all unary inclusion dependencies in large datasets. S-indd is scalable both in the number of attributes and in the number of rows. We show that previous approaches reveal themselves as special cases of S-indd. We exhaustively evaluate S-indd's scalability using many datasets with several thousands attributes and rows up to one million. The experiments show that S-indd is up to 11x faster than previous approaches.
Weitere Informationen
Tags Data_analysis Data_integration Data_mining Data_profiling Inclusion_dependency_discovery S-indd its

Ausgewählte technische Berichte

  • 02-07 Vorwerk / Jiang / Meinel
    Generieren von diagnostischen 3D-Objekten aus deformierten 2D-DICOM Bildern
  • 01-07 Gollan / Vorwerk / Birkel / Meinel
    Studie Teleradiologie: Umfrage unter Akut-Krankenhäusern Baden-Württemberg 2000/2001
  • 01-06 Vorwerk / Meinel
    Die Bedeutung des DICOM Standards für das europäische Gesundheitswesen
  • 00-09 Vorwerk / Losemann
    Modell für den Einsatz von Java Cards im Gesundheitswesen

Telemedizinführer Deutschland