Hasso-Plattner-Institut
Prof. Dr. h.c. Hasso Plattner
  
 

Data-Driven Causal Inference

Probabilistic Machine Learning and Hardware Acceleration

We address open challenges in the context of causal inference by improvements in both the application of statistical and probabilistic concepts, and GPU-based acceleration to support a utilization in a real world context.

Motivation

The emergence of the Internet of Things (IoT) allows for a comprehensive analysis of industrial manufacturing processes. While domain experts within the company have enough expertise to identify the most common relationships, they will require support in the context of both, an increasing amount of observational data and the complexity of large systems of observed features. This gap can be closed by machine learning algorithms of causal inference that derive the underlying causal relationships between the observed features. 

Background

The questions that motivate most data analysis are not associational but causal in nature. What are the causes and what are the effects of events under observation? Nevertheless, the statistical methods commonly used today to answer those questions are of associational nature. But “Correlation does not imply causation!”, and misinterpretation often results in incorrect deduction. 

In the recent years, causality has grown from a nebulous concept into a mathematical theory. This is largely due to the work of Judea Pearl (2009) who has received the 2011 Turing Award for “fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”.

Schematic representation of the constraint-based causal structure learning approach

The causal theory is based upon structural causal models which combine features of the structural equation models (SEM) , the potential-outcome framework of Neyman and Rubin, and the graphical models developed for probabilistic reasoning and causal analysis. In this framework, causal relationships are encoded in a causal graph that incorporates a finite set of nodes and edges representing the involved variables and causal relationships, respectively.

There are two common approaches for learning the causal structures:

  • The search-and-score approaches try to find a causal structure by comparing the optimized scores for possible causal structures given the observational data.
  • The constraint-based approaches query the observational data for conditional independencies to obtain an undirected skeleton. Building on this skeleton, the algorithms determine the orientation of the detected relationships to construct a causal structure.

When the true causal structure is given, the so-called do-calculus allows for an identification of the causal effects in the observed system. Moreover, the relationships in the causal graph build the basis of estimation procedures to derive the functional relationships that allows to predict the result of an intervention to the system.

Research Goals

Our research in the context of causal inference concentrates on several workstreams that combined aim to allow an efficient application in real-world scenarios.

Causal Structure Learning:

  • Allow for more flexible learning algorithms
  • Leverage existing knowledge of relationships

Causal Inference:

  • Causal inference in the context of causal structural knowledge
  • Evaluation of interventional simulation and optimization approaches
  • Active learning of a functional system based on causal structural model

Hard- and Software Acceleration:

  • Enineering of a modular IT system that enables the application of machine learning techniques in the context of causal inference
  • Application of Graphics Processing Units (GPU) to develop efficient implementations
  • Examination of integration into In-Memory Databases

Use Cases

Together with different cooperation partners we examine how causal inference can be applied in a real-world context.

Manufacturing:

Medicine:

  • Application of causal inference in the context of gene expression data
  • Incorporation of genetic variants and gene expression data to detect causal relationships

Database Operations:

  • Examination of causal inference on the basis of system time series data
  • Derivation of insights in the context of an unexpected system behavior

Publications

  • Schmidt, C., Huegle, J., Uflacker, M.: Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs. SSDBM '18 Proceedings of the 30th International Conference on Scientific and Statistical Database Management. p. 19:1--19:10. ACM, New York, NY, USA (2018).