Hasso-Plattner-Institut
Prof. Dr. h.c. Hasso Plattner
  
 

Data-Driven Causal Inference

Probabilistic Machine Learning and Hardware Acceleration

We address open challenges in the context of causal structure learning by improvements in both the application of statistical and probabilistic concepts, and GPU-based acceleration to support a utilization in a real-world context.

Motivation

The knowledge of the causal structures within complex systems is crucial to many domains. For example in manufacturing, where the knowledge of causal relationships constitutes the basis of error avoidance in a production processes. While domain experts within the company have enough expertise to identify the most common relationships, they will require support in the context of both, an increasing amount of observational data and the complexity of large systems. This gap can be closed by algorithms of causal structure learning that derive the underlying causal relationships from observational data, e.g., error messages and inline measurements of a production process.

Background

The questions that motivate most data analysis are not associational but causal in nature. What are the causes and what are the effects of events under observation? Nevertheless, the statistical methods commonly used today to answer those questions are of associational nature. But “Correlation does not imply causation!”, and misinterpretation often results in incorrect deduction. 

In the recent years, causality has grown from a nebulous concept into a mathematical theory. This is largely due to the work of Judea Pearl (2009) who has received the 2011 Turing Award for “fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”.

Schematic representation of the constraint-based causal structure learning approach

The theory of causality is based upon structural causal models which combine features of the structural equation models, the potential-outcome framework of Neyman and Rubin, and the causal graphical models developed for probabilistic reasoning and causal inference. In this framework, causal relationships are encoded in a causal graphical model that incorporates a finite set of nodes and edges representing the involved variables and causal relationships, respectively.

When the true causal structure is given, the so-called do-calculus allows for an identification of the causal effects in the observed system. Moreover, the relationships in the causal graph build the basis of estimation procedures to derive the functional relationships that allows to predict the result of an intervention to the system.

Research Goals

Our research in the context of data-driven causal inference concentrates on several workstreams that combined aim to allow an efficient application in real-world scenarios.

Causal Structure Learning:

  • Evaluation of real-world scenarios with respect to data characterization and opportunities of CSL
  • Introduction of information theoretic consistent dependence measures
  • Allow for more flexible learning algorithms in the context heterogeneous data characteristics

Hard- and Software Acceleration:

  • Engineering of a modular IT system that enables the application of machine learning techniques in the context of causal inference
  • Application of Graphics Processing Units (GPU) to develop efficient implementations
  • Examination of integration into In-Memory Databases

Use Cases

Together with different cooperation partners we examine how causal inference can be applied in a real-world context.

Manufacturing:

Medicine:

  • Application of causal inference in the context of gene expression data
  • Incorporation of genetic variants and gene expression data to detect causal relationships

Publications

  • Schmidt, C., Huegle, J., Bode, P., Uflacker, M.: Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data. SIGKDD Workshop on Causal Discovery. p. 59--77 (2019).
     
  • Schmidt, C., Huegle, J., Uflacker, M.: Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs. SSDBM '18 Proceedings of the 30th International Conference on Scientific and Statistical Database Management. p. 19:1--19:10. ACM, New York, NY, USA (2018).
     

A Pipeline for Causal Inference