Regular Research Papers
Abstract. Despite unified data models, such as Resource Description Framework (RDF) on structural level and the corresponding query language Sparql, the integration and usage of Linked Open Data faces major heterogeneity challenges on the semantic level. Incorrect use of ontology concepts and class properties impede the goal of machine readability and knowledge discovery. For example, users searching for movies with a certain artist cannot rely on a single given property artist, because some movies may be connected to that artist by the predicate starring. In addition, the information need of a data consumer may not always be clear and her interpretation of given schemata may differ from the intentions of the ontology engineer or data publisher.
It is thus necessary to either support users during query formulation or to incorporate implicitly related facts through predicate expansion. To this end, we introduce a data-driven synonym discovery algorithm for predicate expansion. We applied our algorithm to various data sets as shown in a thorough evaluation of different strategies and rule-based techniques for this purpose.
Detecting SPARQL Query Templates for Data Prefetching
Johannes Lorey, Felix Naumann
Abstract. Publicly available Linked Data repositories provide a multitude of information. By utilizing SPARQL, Web sites and services can consume this data and present it in a user-friendly form, e.g., in mash-ups. To gather RDF triples for this task, machine agents typically issue similarly structured queries with recurring patterns against the SPARQL endpoint. These queries usually differ only in a small number of individual triple pattern parts, such as resource labels or literals in objects. We present an approach to detect such recurring patterns in queries and introduce the notion of query templates, which represent clusters of similar queries exhibiting these recurrences. We describe a matching algorithm to extract query templates and illustrate the benefits of prefetching data by utilizing these templates. Finally, we comment on the applicability of our approach using results from real-world SPARQL query logs.
Workshop Paper (USEWOD)
Caching and Prefetching Strategies for SPARQL Queries
Johannes Lorey, Felix Naumann
Abstract. Linked Data repositories offer a wealth of structured facts, useful for a wide array of application scenarios. However, retrieving this data using SPARQL queries yields a number of challenges, such as limited endpoint capabilities and availability, or high latency for connecting to it. To cope with these challenges, we argue that it is advantageous to cache data that is relevant for future information needs. However, instead of only retaining results of previously issued queries, we aim at retrieving data that is potentially interesting for subsequent requests in advance. To this end, we present different methods to modify the structure of a query so that the altered query can be used to retrieve additional related information. We evaluate these approaches by applying them to requests found in real-world SPARQL query logs.
PhD Symposium Paper
Storing and Provisioning Linked Data as a Service
Abstract. Linked Data offers novel opportunities for aggregating information about a wide range of topics and for a multitude of applications. While the technical specifications of Linked Data have been a major research undertaking for the last decade, there is still a lack of real-world data and applications exploiting this data. Partly, this is due to the fact that datasets remain isolated from one another and their integration is a non-trivial task. In this work, we argue for a Data-as-a-Service approach combining both warehousing and query federation to discover and consume Linked Data. We compare our work to state-of-the-art approaches for discovering, integrating, and consuming Linked Data. Moreover, we illustrate a number of challenges when combining warehousing with federation features, and highlight key aspects of our research.