Data-Driven Causal Inference

Probabilistic Machine Learning and Hardware Acceleration

We address open challenges in the context of causal structure learning by improvements in both the application of statistical and probabilistic concepts, and GPU-based acceleration to support a utilization in a real-world context.

Motivation

The knowledge of the causal structures within complex systems is crucial to many domains. For example in manufacturing, where causal structural knowledge constitutes the basis of error avoidance in a production process. While domain experts within the company have enough expertise to identify the most common relationships, they will require support in the context of both, an increasing amount of observational data and the complexity of large systems. This gap can be closed by algorithms of causal structure learning (CSL) that derive the underlying causal structures from observational data, e.g., error messages and inline measurements of a production process.

Background

The questions that motivate most data analysis are not associational but causal in nature. What are the causes and what are the effects of events under observation? Nevertheless, the statistical methods commonly used today to answer those questions are of associational nature. But “Correlation does not imply causation!”, and misinterpretation often results in incorrect deduction.

In the recent years, causality has grown from a nebulous concept into a mathematical theory. This is largely due to the work of Judea Pearl (2009) who has received the 2011 Turing Award for “fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”.

Schematic representation of the constraint-based causal structure learning approach

The theory of causality is based upon structural causal models which combine features of the structural equation models, the potential-outcome framework of Neyman and Rubin, and the causal graphical models developed for probabilistic reasoning and causal inference. In this framework, causal relationships are encoded in a causal graphical model that incorporates a finite set of nodes and edges representing the involved variables and causal relationships, respectively.

When the true causal structure is given, the so-called do-calculus allows for an identification of the causal effects in the observed system. Moreover, the relationships in the causal graph build the basis of estimation procedures to derive the functional relationships that allows to predict the result of an intervention to the system.

Research Goals

Our research in the context of data-driven causal inference concentrates on several workstreams that combined aim to allow an efficient application in real-world scenarios.

Causal Structure Learning:

Evaluation of real-world scenarios with respect to data characteristics and opportunities of CSL
Introduction of an information-theoretic approach to improve CSL in the context of heterogeneous data

Hard- and Software Acceleration:

Development of a modular pipeline MPCSL for experimental evaluation of CSL from observational data and MANM-CS for generation of benchmark data, see MPCSL on GitHub, MANM-CS on GitHub
Application of Graphics Processing Units (GPU) to develop efficient implementations in heterogeneous computing systems, see gpucsl on GitHub

Use Cases

Together with different cooperation partners we examine how causal inference can be applied in a real-world context.

Manufacturing:

Identification of relevant involved factors in an industrial manufacturing process
Derivation of causal structures in complex production processes, e.g., Bachelor Project: Application of Causal Inference in Automotive Production

Medicine:

Application of CSL in the context of gene expression data

Teaching

In the context of our research on data-driven causal inference, we provide several master thesis topic areas to work on specific challenges. Should you be interested in any of those topics please feel free to contact the responsible research assistant for further information.

Moreover, we offer positions as student assistants that focus on implementation-related aspects. If you are interested to work in the field of causal structure learning contact us.

Winter Term 2021/22

GPU-Accelerted Causal Structure Learning

Winter Term 2020/21

A Benchmark Suite for Causal Inference or “How to model a complex world?”

Summer Term 2020

Lecture: Causal Inference - Theory and Applications in Enterprise Computing

Winter Term 2019/20

Master Project: Causal Reasoning on Enterprise Data

Summer Term 2019

Lecture: Causal Inference - Theory and Applications in Enterprise Computing

Winter Term 2018/19

Master Project: Design and Implementation of a Causal Inference Pipeline

Summer Term 2018

Winter Term 2017/18

Bachelor Project: Application of Causal Inference in Automotive Production

Publications

Hagedorn, C., Lange, C., Huegle, J., Schlosser, R.: GPU Acceleration for Information-theoretic Constraint-based Causal Discovery. In: Le, T.D., Liu, L., Kıcıman, E., Triantafyllou, S., and Liu, H. (eds.) Proceedings of The KDD’22 Workshop on Causal Discovery, Proceedings of Machine Learning Research (PMLR) 185. pp. 30–60 (2022).

[ BibTeX ] [ URL ]

Hagedorn, C., Huegle, J., Schlosser, R.: Understanding Unforeseen Production Downtimes in Manufacturing Processes using Log Data-driven Causal Reasoning. Journal of Intelligent Manufacturing. 33, 2027–2043 (2022).

[ BibTeX ] [ URL ] [ DOI ]

Huegle, J., Hagedorn, C., Boehme, L., Poerschke, M., Umland, J., Schlosser, R.: MANM-CS: Data Generation for Benchmarking Causal Structure Learning from Mixed Discrete-Continuous and Nonlinear Data. WHY-21 @ NeurIPS 2021 (2021).

[ BibTeX ] [ URL ]

Hagedorn, C., Huegle, J.: Constraint-Based Causal Structure Learning in Multi-GPU Environments. In: Seidl, T., Fromm, M., and Obermeier, S. (eds.) Proceedings of the LWDA 2021 Workshops: FGWM, KDML, FGWI-BIA, and FGIR, Online, September 1-3, 2021. pp. 106–118. CEUR-WS.org (2021).

[ BibTeX ] [ URL ]

Huegle, J.: An Information-Theoretic Approach on Causal Structure Learning for Heterogeneous Data Characteristics of Real-World Scenarios. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. pp. 4891–4892. International Joint Conferences on Artificial Intelligence Organization (2021).

[ BibTeX ] [ URL ] [ DOI ]

Huegle, J., Hagedorn, C., Perscheid, M., Plattner, H.: MPCSL - A Modular Pipeline for Causal Structure Learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 3068–3076. Association for Computing Machinery, New York, NY, USA (2021).

[ BibTeX ] [ URL ] [ DOI ]

Hagedorn, C., Huegle, J.: GPU-Accelerated Constraint-Based Causal Structure Learning for Discrete Data. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). pp. 37–45 (2021).

[ BibTeX ] [ URL ] [ DOI ]

Huegle, J., Hagedorn, C., Uflacker, M.: How Causal Structural Knowledge Adds Decision-Support in Monitoring of Automotive Body Shop Assembly Lines. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. pp. 5246–5248. International Joint Conferences on Artificial Intelligence Organization (2020).

[ BibTeX ] [ URL ] [ DOI ]

Schmidt, C., Huegle, J.: Towards a GPU-Accelerated Causal Inference. HPI Future SOC Lab - Proceedings 2017. pp. 187–194 (2020).

[ BibTeX ] [ DOI ]

10.

Schmidt, C., Huegle, J., Horschig, S., Uflacker, M.: Out-of-Core GPU-Accelerated Causal Structure Learning. Algorithms and Architectures for Parallel Processing. ICA3PP 2019. pp. 89–104. Springer International Publishing (2020).

[ BibTeX ] [ URL ]

11.

Schmidt, C., Huegle, J., Bode, P., Uflacker, M.: Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data. SIGKDD Workshop on Causal Discovery. pp. 59–77 (2019).

[ BibTeX ] [ URL ]

12.

Schmidt, C., Huegle, J., Uflacker, M.: Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs. SSDBM ’18 Proceedings of the 30th International Conference on Scientific and Statistical Database Management. pp. 19:1–19:10. ACM, New York, NY, USA (2018).

[ BibTeX ] [ URL ] [ DOI ]

Contact

Christopher Hagedorn (geb. Schmidt), Johannes Huegle, Dr. Rainer Schlosser

Data-Driven Causal Inference

Probabilistic Machine Learning and Hardware Acceleration

Motivation

Background

Research Goals

Use Cases

Teaching

Publications

Contact

News

22.09.2023 | Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024

22.05.2023 | Christopher Hagedorn Successfully Defended His PhD Thesis

03.03.2023 | Last Trends and Concepts course of Prof. Hasso Plattner

01.03.2023 | Jan Kossmann Successfully Defended His PhD Thesis

26.02.2023 | Paper on Data Tiering in Hyrise Published in BTW Proceedings

24.02.2023 | Paper on EPIC Research Group Published in SIGMOD Record

30.11.2022 | Paper on Database Optimizations for Spatio-Temporal Data published in PVLDB

04.10.2022 | Günter Hesse Successfully Defended His PhD Thesis

08.07.2022 | Successful PhD Defense by Markus Dreseler

Literature

Contact