The Hasso Plattner Institute offers a practically-oriented computer science study program at an internationally recognized institute. This study includes the Germany-wide unique IT-Systems Engineering program and the five master programs Cybersecurity, Data Engineering, Digital Health, IT-Systems Engineering and Software Systems Engineering.

Our researchers at HPI benefit from an inspiring scientific environment as well as a collaborative and inclusive atmosphere. In this environment, they obtain insights and findings that achieve societal impact. Our scientific work is structured within research clusters. In addition, we work together with scientific institutions, companies, and public institutions in numerous research programs worldwide.

The Hasso Plattner Institute in Potsdam is unique on the German academic landscape. The institute's program continues to grow with the support of its founder Hasso Plattner and through international cooperation. Find out more about the founder, events and studies at HPI.

The Hasso Plattner Institute has educational programs for both high school students and working professionals. It operates its own IT learning platform - openHPI - which provides free online courses. The Youth Academy organizes computer science camps and events for high school students. Professionals can take advantage of educational opportunities in the field of Design Thinking at the HPI Academy.

The press area of the Hasso Plattner Institute provides you with the latest press material, news, information on our social media channels and contact details.

Michael Loster

Knowledge Base Construction with Machine Learning Methods

Modem knowledge bases contain and organize knowledge from many different topic areas. Apart from specific entity information, they also store information about their relationships among each other. Combining this information results in a knowledge graph that can be particularly helpful in cases where relationships are of central importance. Among other applications, modern risk assessment in the financial sector can benefit from the inherent network structure of such knowledge graphs by assessing the consequences and risks of certain events, such as corporate insolvencies or fraudulent behaviour, based on the underlying network structure. As public knowledge bases often do not contain the necessary information for the analysis of such scenarios, the need arises to create and maintain dedicated domain-specific knowledge bases.

This thesis investigates the process of creating domain-specific knowledge bases from structured and unstructured data sources. In particular, it addresses the topics of named entity recognition, duplicate detection, and knowledge validation, which represent essential steps in the construction of knowledge bases.

As such, we present a novel method for duplicate detection based on Siamese neural networks that is able to learn a dataset-specific similarity measure which is used to identify duplicates. Using the specialized network architecture, we design and implement a knowledge transfer between two deduplication networks, which leads to significant performance improvements and a reduction of required training data.

Furthermore, we propose a named entity recognition approach that is able to identify company names by integrating external knowledge in the form of dictionaries into the training process of a conditional random field classifier. In this context, we study the effects of different dictionaries onthe performance of the NER classifier and show that both the inclusion of domain knowledge as well as the generation and use of alias names results in significant performance improvements.

For the validation of knowledge represented in a knowledge base, we introduce COLT, a rule-based framework for knowledge validation based onthe interactive quality assessment of logical rules. In its most expressive implementation, we combine Gaussian processes with neural networks to create COLT-GP, an interactive algorithm for learning rule models. Unlike other approaches, COLT-GP uses knowledge graph embeddings and user feedback to deal with data quality issues of knowledge graphs. The learned rule model can be used to conditionally apply a rule as well as to assess its quality.

Finally, we present Cur Ex, a prototypical system for building domain-specific knowledge bases from structured and unstructured data sources. Its modular design is based on scalable technologies, which, in addition to processing large datasets, ensures that the modules can be easily exchangedor extended. CurEx offers multiple user interfaces, each tailored to the individual needs of a specific user group and is fully compatible with the COLT framework, which can be used as part of the system.

We conduct a wide range of experiments with different datasets to determine the strengths and weaknesses of the proposed methods. To ensure the validity of our results, we compare the proposed methods with competing approaches.

Ombudsperson

Ombudspersons serve as neutral and qualified advisors in questions of good scientific practice and in suspected cases of scientific misconduct.

As far as possible, they contribute to solution-oriented conflict mediation.

If you have any questions, please contact:

Prof. Dr. Tilmann Rabl

Tel.: +49 (0)331 5509-280
E-Mail: tilmann.rabl(at)hpi.de

Future SOC Lab

The “HPI Future SOC Lab” is a cooperation of the Hasso-Plattner-Institut (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners.

Further Information

Research Schools

The HPI Research Schools for "Service-Oriented Systems Engineering" and "Data Science and Engineering" have branches in Cape Town, Haifa, Irvine and Nanjing.

Further Information

Digital Health Cluster

The Digital Health Cluster of the Hasso Plattner Institut (HPI) brings together individuals from health sciences, human sciences, data sciences, digital engineering and society with a shared goal to improve health and wellbeing.

Further Information