Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Mining RDF Data

On this page we present our recent results and code on mining RDF Data. In particular, we present experimental results on synonym discovery, ontology alignment and data enrichment. We have also integrated these three use cases into our mining and profiling tool ProLOD++. ProLOD++ is still under development, but you are welcome to try some basic features by clicking here.

Experimental Results

Mining Configurations as introduced in [2,5] can be used to create new facts for entities, align ontologies to the underlying data and to discover synonymously used predicates. In the following we provide several files that contain interesting artifacts of our experimental results for browsing and reviewing purposes.

Amending RDF Entities with new Facts

Our data-driven amendment algorithm generates new facts based on high-confidence object to object rules. Our paper on that algorithm is currently under review. Hence, we provide the actual generated facts of experimental runs of our algorithm.

File Structure

In addition to the generated triples each file contains the set of generated predicate rules and object rules. Before each set of generated facts a set of "Conditions" and a "Consequences" are denoted. They resemble all high-confidence rules Condition -> Consequence, on which basis we picked entities that needed to be amended with a new fact having the denoted Consequence as a property value.

E.g.:
Condition: dbpedia.org/resource/South_Park
Consequence: dbpedia.org/resource/Trey_Parker
dbpedia.org/resource/Follow_That_Egg%21dbpedia.org/ontology/directordbpedia.org/resource/Trey_Parker .

Our algorithm generated based on the rule

dbpedia.org/resource/South_Park ---> dbpedia.org/resource/Trey_Parker

the fact

dbpedia.org/resource/Follow_That_Egg%21dbpedia.org/ontology/directordbpedia.org/resource/Trey_Parker.

Files with generated facts:

Generated Facts for Thing Entities

Generated Facts for Person Entities

Generated Facts for Album Entities

Generated Facts for Animal Entities

Generated Facts for Artist Entities

Generated Facts for Film Entities

Generated Facts for Organisation Entities

Generated Facts for Place Entities

Generated Facts for Species Entities

Generated Facts for Work Entities

Generated Facts on YAGO2 (The underlying dataset is a cleaned version of YAGO2 as provided by the athor's of the AMIE project)

Reconciling Ontologies and the Web of Data

To analyze the performance of our ontology alignment algorithm [4,6] based on association rules, we provide the following file that contains the results of a complete analysis of DBpedia Infoboxes Ontology Data 3.6 and 3.7 that we reported in our paper [4].

File Format

LabelContent
Type: current DBpedia class
Schema

Set of predicates that have the DBpedia Class in their domain

CandidatesSet of predicates that are proposed to be added to the Class mentioned above
Pushed Candidates:Set of predicates that are proposed for this clss but were originally defined for a different class as illustrated.
Removed Predicates: Set of predicates that are proposed to be removed from that class.

Experiments on Dbpedia 3.6

Experiments on DBpedia 3.7

Synonym Analysis for predicate Expansion

For the evaluation of our synonym discovery approach we manually checked all possible predicate pairs of two datasest, DBpedia Work 3.7 and magnatune [3]. 

The following two files contain all pairs of predicates that have been classified as synonymously interchangable by three computer scientists. Each line corresponds to a pair of synonym candidates.

DBpedia Work 3.7 synonyms

Magnatune synonyms

Code

to appear. (Under polishing process :-))

Publications

  • Profiling and Mining RDF Data with ProLOD++. Abedjan, Ziawasch; Gruetze, Toni; Jentzsch, Anja; Naumann, Felix (2014).
     
  • Amending RDF Entities with New Facts. Abedjan, Ziawasch; Naumann, Felix (2014).
     
  • Synonym Analysis for Predicate Expansion. Abedjan, Ziawasch; Naumann, Felix (2013).
     
  • Improving RDF Data through Association Rule Mining. Abedjan, Ziawasch; Naumann, Felix in Datenbank-Spektrum (Special Issue on RDF Data Management) (2013). 13(2) 111–120.
     
  • Reconciling Ontologies and the Web of Data. Abedjan, Ziawasch; Lorey, Johannes; Naumann, Felix (2012). 1532–1536.
     
  • Context and Target Configurations for Mining RDF Data. Abedjan, Ziawasch; Naumann, Felix (2011).
     
  • RDF Ontology (Re-)Engineering through Large-scale Data Mining. Lorey, Johannes; Abedjan, Ziawasch; Naumann, Felix; Böhm, Christoph (2011).