Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Open Master's Theses

I provide supervision for the following Master's thesis topic. Feel free to contact me to discuss further, to find a new topic, or to suggest a topic of your own. The typical workflow for writing a Master's thesis at our chair is described here.

Natural Language Processing for Patent Retrieval

Granted patents form an extensive knowledge base for information retrieval, which is an interesting research field for academia and industry. Especially domain-specific terminology is challenging for state-of-the-art approaches. Therefore, this master’s thesis focuses on document representations that are able to capture a patent’s topics. These representations are the basis for a patent retrieval algorithm.

In this master thesis, you will jointly mine the topical aspect, but also the spatial aspect of a dataset of 5 million patents, in order to improve current retrieval models. For example, the inventor’s address can be geocoded to the actual geolocation, so that regional patterns can be found. Besides regional patterns, you will analyse patent topics with regard to changes over time. Therefore, you will deal with topic modeling, document embedding, and geocoding. 

This Master's thesis will be jointly supervised by Julian Risch and Ioannis Koumarelas.

A visualization of the number of granted patents per engineering doctorate holder (Source: https://www.nsf.gov/statistics/seind12/c8/fig08-50.gif)
Besides unstructured text data describing the invention, patents contain structured data, such as the inventor’s address.
An address can be geocoded to the actual geolocation, so that regional patterns can be found. (generated using https://nominatim.openstreetmap.org/)