Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

Overview

In this project, we want to extract the semantic business-relationships between companies, by analyzing web data sources, such as news articles, web sites of companies, and structured knowledge bases. Detecting business relationships has many commercial applications, for instance, risk-, market-, and competitor analysis. We are currently focused on relationship types, such as ownership_of, partnership_of, competitor_of, and supplier_of. Our final goal is to build a semantic graph.

Relationship Extraction Pipline

We present a semi-supervised relationship extraction strategy, which inspired by the basic pipeline of Snowball [1]. We propose pipeline which combines named entity recognition, disambiguation, and relationship extraction to extract specific relationships between companies based on only a few user provided seed company pairs that are known to participate in the relationship of interest. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relationships, such as ownership_of.

Experimental Results (ownership_of relationship)

Corpus	Experiment Type	File
New York Times (1987-2007)	Precision	pdf
Recall	pdf

Corpus

Experiment Type

File

New York Times

(1987-2007)

Precision

pdf

Recall

pdf

Experiments result on Wikipedia articles can be download here.

Annotated Data (ownership_of relationship)

All labeled company pairs (NYTimes) can be downloaded here.

Reference

[1] E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In Proceedings of the International Conference on Digital Libraries (DL), pages 85-94, 2000.

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

Project highlights

Metanome: Big Data Profiling

Data Preparation

Janus: Change exploration

KITQAR: AI and Data Quality

Overview

Relationship Extraction Pipline

Experimental Results (ownership_of relationship)

Annotated Data (ownership_of relationship)

Reference

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions