Learning causal relationships from observational data is insightful for researchers in multiple domains. For example, in genetic research, gene regulatory networks can be derived from gene expression data. Real world gene expression datasets are often high-dimensional, resulting in long execution times prohibiting the application of constraint-based Causal Structure Learning (CSL) algorithms in practice.
As part of our research on Data-Driven Causal Inference we investigate the adaptation of existing CSL algorithms to utilize parallel processing capabilities of modern hardware in order to speed-up execution. This fosters the application of CSL in real-world settings, both in industry and in research. On multi-core CPUs, we introduce dynamic load-balancing for parallel execution of CSL algorithms to avoid situations of under- or overutilization of compute resources to effectively reduce runtimes.
In recent years, GPUs have advanced to become a source for massive parallel processing. By utilizing the thousands of parallel processing units and shared memory in GPUs, we achieve a runtime improvement by factors of up to 700 for multivariate normal distributed data.
In a next step, we will generalize our GPU-accelerated algorithm to fit different data distributions. For this purpose, we develop a definition of tasks for parallel execution independent of the current data distribution. Such tasks support fine-grained parallelism, e.g., to speed-up processing of raw observational data during conditional independence tests.