My thesis presents novel ideas and research findings for the Web of Data – a global data space spanning many so-called Linked Open Data sources. Linked Open Data is by now an established concept and many (mostly academic) publishers adopted the design for the publication of data building a powerful web of structured knowledge available to everybody. However, so far, the Linked Open Data deployment status exhibits several shortcomings – some of which I address in my dissertation.
In this talk, I will present our contribution to entity linking research. Specifically, I will discuss an optimization model for joint entity linking and propose three heuristics, which facilitate large-scale data processing, implemented in the LINked Data Alignment (LINDA) system. Our first solution can exploit multi-core machines, whereas the second and third approaches are designed to run in a distributed shared-nothing environment. I will elaborate on the approaches’ properties leading to recommendations which algorithm to use in a specific scenario. The distributed algorithms are among the first of their kind, i.e., approaches for joint entity linking in a distributed fashion. I will illustrate that we can tackle the entity linking problem on the very large scale with data comprising more than 100 millions of entity representations from very many sources.
In summary, my work contributes a set of techniques for enhancing the current state of the Web of Data. All approaches have been tested on large and heterogeneous real- world input.