The questions that motivate most data analysis in an enterprise context are of causal nature. E.g., what are the causes and effects of events in manufacturing processes under observation? Nevertheless, the associational nature of state-of-the-art statistical methods often leads to misinterpretation and incorrect deduction. A recently developed mathematical theory of interventions enables causal reasoning on the basis of observational data but is limited due to performance constraints especially in the context of enterprise data characteristics.
This master project has the intention to extend an existing modular IT system that enables the application of machine learning techniques for causal inference in a real-world context. Therefore, you will deep dive into the core engine and improve existing approaches of constraint-based causal structure learning. In the context of enterprise data, complex and heterogeneous data-characteristics, high dimensionality, or large data volume may lead to long execution times of such a pipeline or even deficient results. Your goal is to drastically improve the performance of existing approaches, e.g., through parallelization techniques, and further incorporate methods for heterogeneous data, which are not yet available in the existing pipeline.
As part of the journey, you will have the opportunity to deepen your knowledge about tools for data science, improve machine learning skills, and influence the performance of an end-to-end pipeline for causal inference.