The questions that motivate most data analyses in an enterprise context are of causal nature, e.g., what are the
causes and effects of events in manufacturing processes under observation? Nevertheless, the statistical methods commonly used today to answer those questions are of associational nature. But “Correlation does not imply causation!”, and misinterpretation often results in incorrect deduction [1]. For example, a constant conjunction of a raw material and an error in many past instances of a manufacturing process may let you falsely assume that a causal relationship exists. Hence, the avoidance of this raw material may not affect the error.
Based upon causal graphical models, the mathematical theory of causal calculus for modeling interventions introduces a framework for causal inference from observational data [2]. The variety of implemented methods
for causal inference and the complexity of data characteristics in real-world scenarios introduces the demand of a modular pipeline that enables to plug in existing methods from different libraries into a single system to compare and evaluate the results, also under the aspect of improving existing algorithms.
In complex real-world settings, underlying causal structures are mostly unknown such that evaluation of methods for causal inference is difficult. Therefore, such a pipeline should incorporate a benchmark suite to generate synthetic data from known causal structures to enable a comprehensive evaluation. Hence, it remains to answer “How to model a complex world?".