Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Overview

In this project, we want to extract the semantic business-relationships between companies, by analyzing web data sources, such as news articles, web sites of companies, and structured knowledge bases. Detecting business relationships has many commercial applications, for instance, risk-, market-, and competitor analysis. We are currently focused on relationship types, such as ownership_of, partnership_of, competitor_of, and supplier_of. Our final goal is to build a semantic graph.

Relationship Extraction Pipline

We present a semi-supervised relationship extraction strategy, which inspired by the basic pipeline of Snowball [1]. We propose pipeline which combines named entity recognition, disambiguation, and relationship extraction to extract specific relationships between companies based on only a few user provided seed company pairs that are known to participate in the relationship of interest. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relationships, such as ownership_of.

Experimental Results (ownership_of relationship)

CorpusExperiment TypeFile

New York Times

(1987-2007)

Precisionpdf
Recallpdf

Experiments result on Wikipedia articles can be download here.

Annotated Data (ownership_of relationship)

All labeled company pairs (NYTimes) can be downloaded here.

Reference

[1] E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In Proceedings of the International Conference on Digital Libraries (DL), pages 85-94, 2000.