Prof. Dr. h.c. mult. Hasso Plattner

# Data-Driven Causal Inference

## Probabilistic Machine Learning and Hardware Acceleration

We address open challenges in the context of causal structure learning by improvements in both the application of statistical and probabilistic concepts, and GPU-based acceleration to support a utilization in a real-world context.

## Motivation

The knowledge of the causal structures within complex systems is crucial to many domains. For example in manufacturing, where causal structural knowledge constitutes the basis of error avoidance in a production process. While domain experts within the company have enough expertise to identify the most common relationships, they will require support in the context of both, an increasing amount of observational data and the complexity of large systems. This gap can be closed by algorithms of causal structure learning (CSL) that derive the underlying causal structures from observational data, e.g., error messages and inline measurements of a production process.

## Background

The questions that motivate most data analysis are not associational but causal in nature. What are the causes and what are the effects of events under observation? Nevertheless, the statistical methods commonly used today to answer those questions are of associational nature. But “Correlation does not imply causation!”, and misinterpretation often results in incorrect deduction.

In the recent years, causality has grown from a nebulous concept into a mathematical theory. This is largely due to the work of Judea Pearl (2009) who has received the 2011 Turing Award for “fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”.

The theory of causality is based upon structural causal models which combine features of the structural equation models, the potential-outcome framework of Neyman and Rubin, and the causal graphical models developed for probabilistic reasoning and causal inference. In this framework, causal relationships are encoded in a causal graphical model that incorporates a finite set of nodes and edges representing the involved variables and causal relationships, respectively.

When the true causal structure is given, the so-called do-calculus allows for an identification of the causal effects in the observed system. Moreover, the relationships in the causal graph build the basis of estimation procedures to derive the functional relationships that allows to predict the result of an intervention to the system.

## Research Goals

Our research in the context of data-driven causal inference concentrates on several workstreams that combined aim to allow an efficient application in real-world scenarios.

Causal Structure Learning:

• Evaluation of real-world scenarios with respect to data characteristics and opportunities of CSL
• Introduction of an information-theoretic approach to improve CSL in the context of heterogeneous data

Hard- and Software Acceleration:

• Development of a modular pipeline MPCSL for experimental evaluation of CSL from observational data and MANM-CS for generation of benchmark data, see MPCSL on GitHubMANM-CS on GitHub
• Application of Graphics Processing Units (GPU) to develop efficient implementations in heterogeneous computing systems, see gpucsl on GitHub

## Use Cases

Together with different cooperation partners we examine how causal inference can be applied in a real-world context.

Manufacturing:

Medicine:

• Application of CSL in the context of gene expression data

## Teaching

In the context of our research on data-driven causal inference, we provide several master thesis topic areas to work on specific challenges. Should you be interested in any of those topics please feel free to contact the responsible research assistant for further information.

Moreover, we offer positions as student assistants that focus on implementation-related aspects. If you are interested to work in the field of causal structure learning contact us.

Winter Term 2021/22

• GPU-Accelerted Causal Structure Learning

Winter Term 2020/21

• A Benchmark Suite for Causal Inference or “How to model a complex world?”

Summer Term 2020

Winter Term 2019/20

Summer Term 2019

Winter Term 2018/19

Summer Term 2018

Winter Term 2017/18

## Publications

• 1.
Hagedorn, C., Lange, C., Huegle, J., Schlosser, R.: GPU Acceleration for Information-theoretic Constraint-based Causal Discovery. In: Le, T.D., Liu, L., Kıcıman, E., Triantafyllou, S., and Liu, H. (eds.) Proceedings of The KDD’22 Workshop on Causal Discovery, Proceedings of Machine Learning Research (PMLR) 185. pp. 30–60 (2022).

• 2.
Hagedorn, C., Huegle, J., Schlosser, R.: Understanding Unforeseen Production Downtimes in Manufacturing Processes using Log Data-driven Causal Reasoning. Journal of Intelligent Manufacturing. 33, 2027–2043 (2022).

• 3.
Huegle, J., Hagedorn, C., Boehme, L., Poerschke, M., Umland, J., Schlosser, R.: MANM-CS: Data Generation for Benchmarking Causal Structure Learning from Mixed Discrete-Continuous and Nonlinear Data. WHY-21 @ NeurIPS 2021 (2021).

• 4.
Hagedorn, C., Huegle, J.: Constraint-Based Causal Structure Learning in Multi-GPU Environments. In: Seidl, T., Fromm, M., and Obermeier, S. (eds.) Proceedings of the LWDA 2021 Workshops: FGWM, KDML, FGWI-BIA, and FGIR, Online, September 1-3, 2021. pp. 106–118. CEUR-WS.org (2021).

• 5.
Huegle, J.: An Information-Theoretic Approach on Causal Structure Learning for Heterogeneous Data Characteristics of Real-World Scenarios. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. pp. 4891–4892. International Joint Conferences on Artificial Intelligence Organization (2021).

• 6.
Huegle, J., Hagedorn, C., Perscheid, M., Plattner, H.: MPCSL - A Modular Pipeline for Causal Structure Learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 3068–3076. Association for Computing Machinery, New York, NY, USA (2021).

• 7.
Hagedorn, C., Huegle, J.: GPU-Accelerated Constraint-Based Causal Structure Learning for Discrete Data. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). pp. 37–45 (2021).

• 8.
Huegle, J., Hagedorn, C., Uflacker, M.: How Causal Structural Knowledge Adds Decision-Support in Monitoring of Automotive Body Shop Assembly Lines. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. pp. 5246–5248. International Joint Conferences on Artificial Intelligence Organization (2020).

• 9.
Schmidt, C., Huegle, J.: Towards a GPU-Accelerated Causal Inference. HPI Future SOC Lab - Proceedings 2017. pp. 187–194 (2020).

• 10.
Schmidt, C., Huegle, J., Horschig, S., Uflacker, M.: Out-of-Core GPU-Accelerated Causal Structure Learning. Algorithms and Architectures for Parallel Processing. ICA3PP 2019. pp. 89–104. Springer International Publishing (2020).

• 11.
Schmidt, C., Huegle, J., Bode, P., Uflacker, M.: Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data. SIGKDD Workshop on Causal Discovery. pp. 59–77 (2019).

• 12.
Schmidt, C., Huegle, J., Uflacker, M.: Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs. SSDBM ’18 Proceedings of the 30th International Conference on Scientific and Statistical Database Management. pp. 19:1–19:10. ACM, New York, NY, USA (2018).