Prof. Dr. h.c. Hasso Plattner

Open Master Theses in Data-Driven Decision-Making

We are looking for interested students to tackle the following master thesis topics in the area of data-driven decision-making:


Causal Structure Learning in Heterogeneous Settings

Causal structure learning (CSL), i.e., the derivation of the causal graphical model representing the functional relationships between variables of a system from observational data, is crucial to many domains. However, while most methods of CSL assume the same data type or a specific family of functional relationships, most real-world scenarios incorporate heterogeneous settings, e.g., mixed data. As part of our research on Data-Driven Causal Inference, we address these challenges through the introduction of information theoretic statistical methods and the development of a modular pipeline for experimental evaluation of CSL from observational data.

In this context, we provide several topic areas work on specific challenges, e.g., 

  • Information-theoretic quantification of causal effects via do-operator in heterogeneous settings
  • Comparative evaluation and improvement of information-theoretic CSL in heterogenous settings
  • Causal structure learning in real-world scenarios - challenges and opportunities

Contact: Johannes Huegle


Parallel Execution Strategies for Causal Structure Learning

Learning the causal relationships in observational data provides relevant insights for researchers in many domains, such as genomics or manufacturing.
Determining the causal structures, in particular using constraint-based approaches on high-dimensional datasets, becomes a challenge with regards to single-threaded execution times. To overcome this obstacle, we investigate parallel execution strategies in multi-core and heterogenous, GPU-accelerated, systems. In this context, we offer to work on different challenges, e.g.,

  • optimize an existing GPU-based implementation for use in a multi-GPU systems
  • survey existing constraint-based causal structure learning implementations, with a focus on parallel execution on different hardware and provide an in-depth experimental comparison
  • investigate and optimize memory utilization for GPU-accelerated causal structure learning algorithms, in particular for settings that exceed single GPU memory

Contact: Christopher Hagedorn


Data-Driven Decision-Making in Dynamic Pricing / Index Selection

The need for automated data-driven decision-making is everywhere and becomes increasingly important. Our research focuses on data-driven decision-making using quantitative methods of operations research and data science applied in the areas of (i) operations management and (ii) database optimization. We are particularly interested in the optimal control of stochastic dynamic systems of real-world applications under risk considerations.

In this context, potential topics for master theses are the following:

  • Index selection using dynamic programming techniques
  • Dynamic pricing under risk aversion
  • Pricing competition using reinforcement learning

Contact: Dr. Rainer Schlosser


Data-Informed Agile Retrospectives

Software development teams produce a lot of development artifacts during their regular work. These rich data sets include code commits (complete with commit message and metadata), work item descriptions in issue trackers and test logs.

  • Our research focuses on enabling Agile development teams to make use of their own treasure troves of data to improve the way they work and collaborate. In particular, we investigate how data-informed process improvement can be best integrated into the existing practice of Scrum Retrospectives.

If you're interested in the above area (or have related ideas), get in touch!

Contact: Christoph Matthies


Combining Machine Learning and External Knowledge for Analyzing Gene Expression Profiles

Biomarkers are the key enablers for precision medicine. They allow for a precise disease diagnosis, selection of appropriate treatment, and assessment of disease progression. While biomarkers can be anything that is objectively measured in a patient, e.g. simple-to-measure heart rates or lab values, current research concentrates on identifying specific genes as biomarkers. Integrative biomarker detection strategies incorporate biological context into the analysis and lead to more robust and biologically meaningful biomarkers. Prior knowledge approaches are a special form of integrative analyses: They incorporate biological knowledge, e.g. gene-gene interactions or gene-diseases associations, from public knowledge bases, e.g. Gene Ontology or KEGG.

We offer master thesis topics in the scope of prior knowledge approaches, covering the implementation of novel approaches/extension of existing approaches, development of meaningful evaluation measures and strategies, and assessment of the usefulness and impact of different knowledge bases and integration levels. Focusing on multi-omics approaches for biomarker detection, i.e. integrating data from multiple molecular levels (gene expression, protein, mutation data) is also an option.

We also offer positions as student assistant/HiWi that focus on implementation-related aspects.

Contact: Cindy Perscheid