Hasso-Plattner-Institut
Prof. Dr.-Ing. Bert Arnrich
 

Pre-Care ML

Funded by ERA PerMed

Project partners:

Predicting Cardiovascular Events Using Machine Learning and EHR

Cardiovascular diseases are the leading cause of death globally, making early identification of patients at high risk of major adverse cardiovascular events (MACE) crucial. Currently, these patients are identified using clinical prediction models based on risk scores that have several limitations. Firstly, most rule-based scores require manual calculation or additional data entry by physicians, which makes it infeasible to screen all hospitalized patients due to resource constraints. This results in a selection bias, as only a subset of patients are evaluated, leading to many high-risk patients remaining undetected. Secondly, traditionally used risk scores have been developed for specific populations, meaning they do not consider diverse individual risk factors.

An alternative approach is to use predictions based on machine learning (ML) models, which can incorporate a larger number of predictors and account for nonlinearity in data. In particular, these ML models can use electronic health record (EHR) data to estimate MACE risk for individual patients. This combination of ML models and EHR data allows for rapid, automated, and personalized risk prediction that can be applied to large patient groups. However, although numerous such ML models have been developed in recent years, validation is rare, and it is unclear how these models perform in different clinical settings or with different populations.

Considering this, in the PRE-CARE ML project we are working on further developing and extensively evaluating ML models, specifically federated learning (FL) models, that can estimate a patient's MACE risk based on their EHR data. The work is mainly structured in three goals: (i) to validate and improve risk-predicting ML models across different hospital networks and populations, (ii) to integrate ML models into different hospital information systems and evaluate their impact on daily hospital routines, and (iii) to investigate effective risk communication strategies to encourage behavioral changes in patients.

As the different hospitals in our consortium use different hospital information systems, we spent a considerable amount of time on data harmonization, a stage that is now nearing completion. This allows us to evaluate models developed at one hospital on geographically and demographically diverse EHR records from the other partner institutions, but also is a crucial requirement for developing models using FL. The first results of the cross-evaluation have already been produced and a publication is soon to follow.

Furthermore, we have actively been working on setting up the FL infrastructure for our experiments, primarily using the NVIDIA FLARE platform, with the first FL experiments planned to take place in January 2024. Finally, regarding the third goal of the project, i.e., the investigation of effective risk communication strategies, we are currently working on a comprehensive systematic literature review of digital risk communication strategies and we will soon start discussing our findings with physicians and patients at risk of/living with a cardiovascular condition.