Contact: Felix Naumann

There are PhD and PostDoc Scholarships at the HPI research school available regularly. The annual application deadline is August 15.

Open student assistant positions

Metadata Management System

Contact: Sebastian Kruse

What is this project about?
The Metadata Management System (MDMS) is an open source project that aims to unlock the potentials of data profiling results, i.e., metadata. Data profiling algorithms reveal latent properties of datasets, e.g., functional and inclusion dependencies, that are a prerequisite to many data management tasks, such as data integration, query optimization, and schema reverse engineering. However, the plain availability of oftentimes sheer amounts of metadata is in general not sufficient - instead the metadata require further processing. This is where the MDMS comes into play: It allows to store metadata in a structured manner and offers analyzing, interaction, and visualization capabilities, so as to detect relevant bits of the metadata and gain actual insights from the combination of different metadata types.

What are the tasks in this project?
The MDMS is evolving constantly. Not all of the components of the system are fully implemented yet. Also, we are seeking to employ modern technologies for the different components, from Apache Cassandra as data layer over Apache Flink and Spark as analytics layer to Apache Zeppelin and Jupyter as interaction and visualization tools. In this project, you will have to evaluate such technologies w.r.t. their suitability for the MDMS; (re-)implement parts of the MDMS; and demonstrate the capabilities of your solution with practical examples. Also, you are encouraged to contribute and discuss the overall design and concepts of the system.

What are the prerequisites to join this project?
MDMS has been implemented with Java and Scala and you should be somewhat proficient in one of these languages. We don't expect you to know all the different systems mentioned above, but you should be interested in learning and adopting such technologies.

What do I learn from the project?
There are clearly three things you can take away from the project: (1) You get to work on open source software. (2) You learn a lot about recent technologies in the database/information systems community. (3) You can familiarize yourselves with the topics of our chair, especially data profiling, which you might benefit from during your studies and your Master's theses.

Business Communication Analysis

Contact: Tim Repke

About the project
Today's business communication is almost unimaginable without emails. They document discussions and decisions or summarise face-to-face meetings in the form of unstructured text or attachments and thus hold a significant amount of information about a business. In very exceptional cases, for example when investigating a known case of fraud, specialists examine inboxes and attached files of involved personnel to determine the extent of the situation. However, the sheer quantity of emails is unmanageable without some guidance by an exploration tool. In this project, we develop and evaluate methods to combine in such an exploration tool. This work touches the fields of text mining, text summarisation, document classification, topic modelling, named entity extraction, entity linking, relationship extraction, as well as social network-, and graph analysis. We work together with our industry partner from the financial sector to put our prototypes in the hands of auditors for real world feedback.

Tasks in this project
Email corpora come in various shapes and formats, which have to be parsed, normalised, and organised in a database for efficient access by downstream applications. We try to develop a flexible and modular pipeline oriented framework, which can be easily extended by new components derived from promising experiments. Besides the data processing and management, a prototype for a web based exploration tool is developed in a future bachelor project. We are looking to creatively integrate innovative interactive visualisation techniques that greatly improve the forensic workflow. As our student assistant, you'll support us building tools for our experiments and evaluation as well as conceptual prototypes.

We can discuss weekly working hours, but hope to fill a 40h/month position.

Why should you join us?

  • apply and expand your coding skills
  • learn a lot about our research
  • be part of our research
  • maybe even a thesis(?)
  • ... because we are an awesome research group