Hasso-Plattner-Institut
Prof. Dr. Christoph Lippert
 

Causal inference in prediction models

In epidemiology, causal inference and prediction modeling methodologies have been historically distinct. Directed Acyclic Graphs (DAGs) are used to model a priori causal assumptions and inform variable selection strategies for causal questions. Although tools originally designed for prediction are finding applications in causal inference, the counterpart has remained largely unexplored. The aim of this theoretical and simulation-based study is to assess the potential benefit of using DAGs in clinical risk prediction modeling.

The results show that a single-predictor model in the causal direction is likely to have better transportability than one in the anticausal direction in some scenarios. We empirically show that the Markov Blanket, the set of variables including the parents, children, and parents of the children of the outcome node in a DAG, is the optimal set of predictors for that outcome.

These findings provide a theoretical basis for the intuition that a diagnostic clinical risk prediction model including causes as predictors is likely to be more transportable. Furthermore, using DAGs to identify Markov Blanket variables may be a useful, efficient strategy to select predictors in clinical risk prediction models if strong knowledge of the underlying causal structure exists or can be learned.

In a current application, we have proposed a causal framework to investigate the transportability of prediction models on Alzheimer's disease in simulated external settings with different distributions of demographic and clinical characteristics. In an ongoing follow-up project, we are investigating the transportability of prediction models on Alzheimer's disease empirically using different populations from studies in the US and South Korea.

 

References: 

 

Team:

 

Collaboration partners:


Estimating and testing in directed acyclic graphs

Overview:

In genetic association studies and in association studies in general, it is important to distinguish direct and indirect effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph setting with interventions (here: genetic variants), primary and intermediate outcomes, and confounding factors.

In order to make valid statistical inference on direct genetic effects on the primary outcome variable, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equations method with robust Huber–White sandwich standard errors. We evaluate the proposed causal inference based on estimating equations (CIEE) method and compare it with traditional multiple regression methods, the structural equation modeling method, and sequential G-estimation methods through a simulation study for the analysis of (completely observed) quantitative traits and time-to-event traits subject to censoring as primary outcome variables.

The results show that CIEE provides valid estimators and inference by successfully removing the effect of intermediate variables from the primary outcome and is robust against measured and unmeasured confounding of the indirect effect through observed factors. All other methods except the sequential G-estimation method for quantitative traits fail in some scenarios where their test statistics yield inflated type I errors. In the analysis of the Genetic Analysis Workshop 19 dataset, we estimate and test genetic effects on blood pressure accounting for intermediate gene expression. The results show that CIEE can identify genetic variants that would be missed by traditional regression analyses. CIEE is computationally fast, widely applicable to different fields, and available as an R package.

 

References:

  • Konigorski S, Wang Y, Cigsar C, Yilmaz YE (2018). Estimating and testing direct genetic effects in directed acyclic graphs using estimating equations. Genetic Epidemiology 42: 174–186. https://doi.org/10.1002/gepi.22107.
  • Konigorski S, Yilmaz YE. CIEE: Estimating and testing direct effects in directed acyclic graphs using estimating equations. R package version 0.1.1.  https://CRAN.R-project.org/package=CIEE.
  • Konigorski S (2021). Causal inference in developmental medicine and neurology. Developmental Medicine & Child Neurology  63(5):498.  https://doi.org/10.1111/dmcn.14813.

 

Team: