Systems medicine is an interdisciplinary approach in medicine that relies on computational models based on data from a variety of sources. Typically, such sources include clinical and biomedical data with heterogeneous data definitions that are sometimes not even structured in a useful way. Consequently, the systematic management of data is an important element for the successful implementation of systems medicine in both research and clinical application. In this article, we provide an overview over the following selected aspects of data management:•Integration of multiple data sources•IT infrastructures•Data protection regulations•Data history and data quality•Data sharing/FAIR principles•Use and access policies The presented best practices and experiences result from several systems medicine projects in which the authors have participated. They can be considered as recommendations for future projects in order to quickly set up data management infrastructures for systems medicine.
Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach.Vaid, Akhil; Jaladanki, Suraj K; Xu, Jie; Teng, Shelly; Kumar, Arvind; Lee, Samuel; Somani, Sulaiman; Paranjpe, Ishan; Freitas, Jessica K De; Wanyan, Tingyi; Johnson, Kipp W; Bicak, Mesude; Klang, Eyal; Kwon, Young Joon; Costa, Anthony; Zhao, Shan; Miotto, Riccardo; Charney, Alexander W; Böttinger, Erwin; Fayad, Zahi A; Nadkarni, Girish N; Wang, Fei; Glicksberg, Benjamin S in JMIR Medical Informatics (2021). 9(1) e24207.
Temporal Trends in COVID-19 associated AKI from March to December 2020 in New York City.Dellepiane, Sergio; Vaid, Akhil; Jaladanki, Suraj K; Paranjpe, Ishan; Coca, Steven; Fayad, Zahi A; Charney, Alexander W; Bottinger, Erwin P; He, John Cijiang; Glicksberg, Benjamin S; Chan, Lili; Nadkarni, Girish (2021).
QRS detectors are used as the most basic processing tool for ECG signals. Thus, there are many situations and signals with a wide range of characteristics in which they shall show great performance. Despite the expected versatility, most of the published QRS detectors are not tested on a diverse dataset. Using 14 databases, 10,000 heartbeats for each different heartbeat type were extracted to show that there are notable performance differences for the tested eight algorithms. Besides the analysis on heartbeat types, this paper also tests the noise resilience regarding different noise combinations. Each of the tested QRS detectors showed significant differences depending on heartbeat type and noise combination. This leads to the conclusion that before choosing a QRS detector, one should consider its use case and test the detector on data representing it. For authors of QRS detectors, this means that every algorithm evaluation should employ a dataset that is as diverse as the one used in this paper to assess the QRS detector’s performance in an objective and unbiased manner.
Optimal Sensor Placement for Human Activity Recognition with a Minimal Smartphone–IMU Setup.Rahn, Vincent Xeno; Zhou, Lin; Klieme, Eric; Arnrich, Bert (2021). (Vol. 10) 37-48.
Human Activity Recognition (HAR) of everyday activities using smartphones has been intensively researched over the past years. Despite the high detection performance, smartphones can not continuously provide reli- able information about the currently conducted activity as their placement at the subject’s body is uncertain. In this study, a system is developed that enables real-time collection of data from various Bluetooth inertial mea- surement units (IMUs) in addition to the smartphone. The contribution of this work is an extensive overview of related work in this field and the identification of unobtrusive, minimal combinations of IMUs with the smartphone that achieve high recognition performance. Eighteen young subjects with unrestricted mobility were recorded conducting seven daily-life activities with a smartphone in the pocket and five IMUs at different body positions. With a Convolutional Neural Network (CNN) for activity recognition, activity classification accuracy increased by up to 23% with one IMU additional to the smartphone. An overall prediction rate of 97% was reached with a smartphone in the pocket and an IMU at the ankle. This study demonstrated the potential that an additional IMU can improve the accuracy of smartphone-based HAR on daily-life activities.
It is time to reality check the promises of machine learning-powered precision medicine.Wilkinson, Jack; Arnold, Kellyn F; Murray, Eleanor J; van Smeden, Maarten; Carr, Kareem; Sippy, Rachel; de Kamps, Marc; Beam, Andrew; Konigorski, Stefan; Lippert, Christoph; Gilthorpe, Mark S; Tennant, Peter WG in The Lancet Digital Health (2020).
Evaluation of the Pose Tracking Performance of the Azure Kinect and Kinect v2 for Gait Analysis in Comparison with a Gold Standard: A Pilot Study.Albert, Justin; Owolabi, Victor; Gebel, Arnd; Brahms, Markus Clemens; Granacher, Urs; Arnrich, Bert in MDPI Sensors (2020). 20(18)
Gait analysis is an important tool for the early detection of neurological diseases and for the assessment of risk of falling in elderly people. The availability of low-cost camera hardware on the market today and recent advances in Machine Learning enable a wide range of clinical and health-related applications, such as patient monitoring or exercise recognition at home. In this study, we evaluated the motion tracking performance of the latest generation of the Microsoft Kinect camera, Azure Kinect, compared to its predecessor Kinect v2 in terms of treadmill walking usinggold standard Vicon multi-camera motion capturing system and the 39 marker Plug-in Gait model. Five young and healthy subjects walked on a treadmill at three different velocities while data were recorded simultaneously with all three camera systems. An easy-to-administer camera calibration method developed here was used to spatially align the 3D skeleton data from both Kinect cameras and the Vicon system. With this calibration, the spatial agreement of joint positions between the two Kinect cameras and the reference system was evaluated. In addition, we compared the accuracy of certain spatio-temporal gait parameters, i.e., step length, step time, step width, and stride time calculated from the Kinect data, with the gold standard system. Our results showed that the improved hardware and the motion tracking algorithm of the Azure Kinect camera led to a significantly higher accuracy of the spatial gait parameters than the predecessor Kinect v2, while no significant differences were found between the temporal parameters. Furthermore, we explain in detail how this experimental setup could be used to continuously monitor the progress during gait rehabilitation in older people.
Good News: Wie Data Science dabei hilft, die Corona-Pandemie besser zu verstehen.Schapranow, Matthieu-P. in Portal Wissen: Das Forschungsmagazin der Universität Potsdam (2020). 9(2) 14--19.
The state of the art for monitoring hypertension relies on measuring blood pressure (BP) using uncomfortable cuff-based devices. Hence, for increased adherence in monitoring, a better way of measuring BP is needed. That could be achieved through comfortable wearables that contain photoplethysmography (PPG) sensors. There have been several studies showing the possibility of statistically estimating systolic and diastolic BP (SBP/DBP) from PPG signals. However, they are either based on measurements of healthy subjects or on patients on (ICUs). Thus, there is a lack of studies with patients out of the normal range of BP and with daily life monitoring out of the ICUs. To address this, we created a dataset (HYPE) composed of data from hypertensive subjects that executed a stress test and had 24-h monitoring. We then trained and compared machine learning (ML) models to predict BP. We evaluated handcrafted feature extraction approaches vs image representation ones and compared different ML algorithms for both. Moreover, in order to evaluate the models in a different scenario, we used an openly available set from a stress test with healthy subjects (EVAL). The best results for our HYPE dataset were in the stress test and had a mean absolute error (MAE) in mmHg of 8.79 (±3.17) for SBP and 6.37 (±2.62) for DBP; for our EVAL dataset it was 14.74 (±4.06) and 7.12 (±2.32) respectively. Although having tested a range of signal processing and ML techniques, we were not able to reproduce the small error ranges claimed in the literature. The mixed results suggest a need for more comparative studies with subjects out of the intensive care and across all ranges of blood pressure. Until then, the clinical relevance of PPG-based predictions in daily life should remain an open question.
IMU-Based Movement Trajectory Heatmaps for Human Activity Recognition.Konak, Orhan; Wegner, Pit; Arnrich, Bert in Sensors (Switzerland) (2020). 20(24) 1--15.
Recent trends in ubiquitous computing have led to a proliferation of studies that focus on human activity recognition (HAR) utilizing inertial sensor data that consist of acceleration, orientation and angular velocity. However, the performances of such approaches are limited by the amount of annotated training data, especially in fields where annotating data is highly time-consuming and requires specialized professionals, such as in healthcare. In image classification, this limitation has been mitigated by powerful oversampling techniques such as data augmentation. Using this technique, this work evaluates to what extent transforming inertial sensor data into movement trajectories and into 2D heatmap images can be advantageous for HAR when data are scarce. A convolutional long short-term memory (ConvLSTM) network that incorporates spatiotemporal correlations was used to classify the heatmap images. Evaluation was carried out on Deep Inertial Poser (DIP), a known dataset composed of inertial sensor data. The results obtained suggest that for datasets with large numbers of subjects, using state-of-the-art methods remains the best alternative. However, a performance advantage was achieved for small datasets, which is usually the case in healthcare. Moreover, movement trajectories provide a visual representation of human activities, which can help researchers to better interpret and analyze motion patterns.
Spotlight on Women in Tech: Fostering an Inclusive Workforce when Exploring and Exploiting Digital Innovation Potentials.Schmitt, Franziska; Sundermeier, Janina; Bohn, Nicolai; Morassi Sasso, Ariane (2020). (Vol. 6)
Prototypical System to Detect Anxiety Manifestations by Acoustic Patterns in Patients with Dementia.Hernandez, Netzahualcoyotl; Garcia-Constantino, Matias; Beltran, Jessica; Hecker, Pascal; Favela, Jesus; Cleland, Ian; Lopez, Hussein; Arnrich, Bert; McChesney, Ian in EAI Endorsed Transactions on Pervasive Health and Technology (2020). 5(19)
INTRODUCTION: Dementia is a syndrome characterised by a decline in memory, language, and problem-solving that affects the ability of patients to perform everyday activities. Patients with dementia tend to experience episodes of anxiety and remain for extended periods, which affects their quality of life. OBJECTIVES: To design AnxiDetector, a system capable of detecting patterns of sounds associated before and during the manifestation of anxiety in patients with dementia. METHODS: We conducted a non-participatory observation of 70 diagnosed patients in-situ, and conducted semi-structured interviews with four caregivers at a residential centre. Using the findings from our observation and caregiver interviews, we developed the AnxiDetector prototype and tested this in an experimental setting where we defined nine classes of audio to represent two groups of sounds: (i) Disturbance which includes audio files that characterise sounds that trigger anxiety in patients with dementia, and (ii) Expression which includes audio files that characterise sounds expressed by the patients during episodes of anxiety. We conducted two experimental classifications of sounds using (i) a Neural Network model trained and (ii) a Support Vector Machine model. The first evaluation consists of a binary discriminating between the two groups of sounds; the second evaluation discriminates the nine classes of audio. The audio resources were retrieved from publicly available datasets. RESULTS: The qualitative results present the views of the caregivers on the adoption of AnxiDetector. The quantitative results from our binary discrimination show a classification accuracy of 98.1% and 99.2% for the Deep Neural Network and Support Vector Machine models, respectively. When classifying the nine classes of sound, our model shows a classification accuracy of 92.2%. Whereas, the Support Vector Machine model yielded an overall classification accuracy of 93.0%. CONCLUSION: In this paper, we presented the outcomes from an observational study in-site at a residential care centre, qualitative findings from interviews with caregivers, the design of AnxiDetector, and preliminary qualitative results of a methodology devised to detect relevant acoustic events associated with anxiety in patients with dementia. We conclude by signalling future plans to conduct in-situ validation of the effectiveness of AnxiDetector for anxiety detection.
Towards the automatic detection of social biomarkers in autism spectrum disorder: introducing the simulated interaction task (SIT).Drimalla, Hanna; Scheffer, Tobias; Landwehr, Niels; Baskow, Irina; Roepke, Stefan; Behnia, Behnoush; Dziobek, Isabel in npj digital medicine (2020). 3(25)
Will You Be My Quarantine: A Computer Vision and Inertial Sensor Based Home Exercise System.Albert, Justin; Zhou, Lin; Gloeckner, Pawel; Trautmann, Justin; Ihde, Lisa; Eilers, Justus; Kamal, Mohammed; Arnrich, Bert (2020). (Vol. 14)
The quarantine situation inflicted by the COVID-19 pandemic has left many people around the world isolated at home. Despite the large variety of mobile device-based self exercise tools for training plans, activity recognition or repetition counts, it remains challenging for an inexperienced person to perform fitness workouts or learn a new sport with the correct movements at home. As a proof of concept, a home exercise system has been developed in this contribution. The system takes computer vision and inertial sensor data recorded for the same type of exercise as two independent inputs, and processes the data from both sources into the same representations on the levels of raw inertial measurement unit (IMU) data and 3D movement trajectories. Moreover, a Key Performance Indicator (KPI) dashboard was developed for data import and visualization. The usability of the system was investigated with an example use case where the learner equipped with IMUs performed a kick movement and was able to compare it to that from a coach in the video.
Using CEF Digital Service Infrastructures in the Smart4Health Project for the Exchange of Electronic Health Records.Slosarek, Tamara; Wohlbrandt, Attila; Böttinger, Erwin in arXiv preprint arXiv:2001.01477 (2020).
ABSTRACT Background: The coronavirus 2019 (Covid-19) pandemic is a global public health crisis, with over 1.6 million cases and 95,000 deaths worldwide. Data are needed regarding the clinical course of hospitalized patients, particularly in the United States. Methods Demographic, clinical, and outcomes data for patients admitted to five Mount Sinai Health System hospitals with confirmed Covid-19 between February 27 and April 2, 2020 were identified through institutional electronic health records. We conducted a descriptive study of patients who had in-hospital mortality or were discharged alive. Results A total of 2,199 patients with Covid-19 were hospitalized during the study period. As of April 2nd, 1,121 (51%) patients remained hospitalized, and 1,078 (49%) completed their hospital course. Of the latter, the overall mortality was 29%, and 36% required intensive care. The median age was 65 years overall and 75 years in those who died. Pre-existing conditions were present in 65% of those who died and 46% of those discharged. In those who died, the admission median lymphocyte percentage was 11.7%, D-dimer was 2.4 ug/ml, C-reactive protein was 162 mg/L, and procalcitonin was 0.44 ng/mL. In those discharged, the admission median lymphocyte percentage was 16.6%, D-dimer was 0.93 ug/ml, C-reactive protein was 79 mg/L, and procalcitonin was 0.09 ng/mL. Conclusions This is the largest and most diverse case series of hospitalized patients with Covid-19 in the United States to date. Requirement of intensive care and mortality were high. Patients who died typically had pre-existing conditions and severe perturbations in inflammatory markers.
How We Found Our IMU: Guidelines to IMU Selection and a Comparison of Seven IMUs for Pervasive Healthcare Applications.Zhou, Lin; Fischer, Eric; Tunca, Can; Brahms, Clemens Markus; Ersoy, Cem; Granacher, Urs; Arnrich, Bert in Sensors (2020).
Validation of an IMU Gait Analysis Algorithm for Gait Monitoring in Daily Life Situations.Zhou, Lin; Tunca, Can; Fischer, Eric; Brahms, Clemens Markus; Ersoy, Cem; Granacher, Urs; Arnrich, Bert (2020).
Gait is an essential function for humans, and gait patterns in daily life provide meaningful information about a person’s cognitive and physical health conditions. Inertial measurement units (IMUs) have emerged as a promising tool for low-cost, unobtrusive gait analysis. However, large varieties of IMU gait analysis algorithms and the lack of consensus for their validation make it difficult for researchers to assess the reliability of the algorithms for specific use cases. In daily life,individuals adapt their gait patterns in response to changes in the environment, making it necessary for IMU gait analysis algorithms to provide accurate measurements despite these gait variations. In this paper, we reviewed common types of IMU gait analysis algorithms and appropriate analysis methods to evaluate the accuracy of gait parameters extracted from IMU measurements. We then evaluated stride lengths and stride times calculated from a comprehensive double integration based IMU gait analysis algorithm using an optoelectric walkway as gold standard. In total, 729 strides from five healthy subjects and three different walking patterns were analyzed. Correlation analyses and Bland-Altman plots showed that this method is accurate and robust against large variations in walking patterns (stride length: correlation coefficient (r) was 0.99, root mean square error (RMSE) was 3% and average limits of agreement (LoA) was 6%; stride time: r was 0.95, RMSE was 4% and average LoA was 7%), making it suitable for gait evaluation in daily life situations. Due to the small sample size, our preliminary findings should be verified in future studies.
Literature Review on Transfer Learning for Human Activity Recognition Using Mobile and Wearable Devices with Environmental Technology.Hernandez, Netzahualcoyotl; Lundström, Jens; Favela, Jesus; McChesney, Ian; Arnrich, Bert in SN Computer Science (2020). 1(2) 66.
Constrained expectation maximisation algorithm for estimating ARMA models in state space representation.Galka, Andreas; Moontaha, Sidratul; SIniatchkin, Siniatchkin in EURASIP Journal on Advances in Signal Processing 2020.1 (2020). 1-37.
Self-prediction of seizures in drug resistance epilepsy using digital phenotyping: a concept study.Moontaha, Sidratul; Steckhan, Nico; Kappattanavar, Arpita; Surges, Rainer; Arnrich, Bert (2020). (Vol. 14)
Drug-resistance is a prevalent condition in children and adult patients with epilepsy. The quality of life of these patients is profoundly affected by the unpredictability of seizure occurrence. Some of these patients are capable of reporting self-prediction of their seizures by observing their affectivity. Some patients report no signs of feeling premonitory symptoms, prodromes, or aura. In this paper, we propose a concept study that will provide objective information to self-predict seizures for both the patient groups. We will develop a model using digital phenotyping which takes both ecological momentary assessment and data from sensor technology into consideration. This method will be able to provide a feedback of their premonitory symptoms so that a pre-emptive therapy can be associated to reduce seizure frequency or eliminate seizure occurrence.
Federated Learning in a Medical Context: A Systematic Literature Review.Pfitzner, Bjarne; Steckhan, Nico; Arnrich, Bert in ACM Transactions on Internet Technology (TOIT) Special Issue on Security and Privacy of Medical Data for Smart Healthcare (2020).
Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. On the other hand, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
Position Matters: Sensor Placement for SittingPosture Classification.Kappattanavar, A. M.; da Cruz, H. F.; Arnrich, B.; Böttinger, E. (2020).
Prolonged sitting behavior and postures that cause strain on the spine and muscles have been reported to increase the probability of low back pain. To address this issue, many commercially available sensors already provide feedback about whether a person is 'slouching' or 'not slouching'. However, they do not provide information on a person's posture, which would give insights into the strain caused by a specific posture. Hence, in this pilot study, we attempt to find the optimum number of inertial measurement unit sensors required and the best locations to place them using six mock postures. Data is collected from these sensors and features are extracted. The number of features are reduced and the best features are selected using the Recursive Feature Elimination method with Cross-Validation. The reduced number of features is then trained and tested on Logistic Regression, Support Vector Machine and Hierarchical Model. Among the three models, the Support Vector Machine algorithm had the highest accuracy of 93.68%, obtained for the thoracic, hip and sacral region sensor combinations. While these findings will be validated in a larger study in an uncontrolled environment, this pilot study quantitatively highlights the importance of sensor placement in shaping discriminative performance in sitting posture classification tasks.
Association of APOL1 Risk Genotype and Air Pollution for Kidney Disease.Paranjpe, Ishan; Chaudhary, Kumardeep; Paranjpe, Manish; O'Hagan, Ross; Manna, Sayan; Jaladanki, Suraj; Kapoor, Arjun; Horowitz, Carol; DeFelice, Nicholas; Cooper, Richard; Glicksberg, Benjamin; Bottinger, Erwin P.; Just, Allan C.; Nadkarni, Girish N. in Clinical Journal of the American Society of Nephrology (2020). 15(3) 401--403.
The Influence of Reward on Facial Mimicry: No Evidence for a Significant Effect of Oxytocin.Trilla, Irene; Drimalla, Hanna; Bajbouj, Malek; Dziobek, Isabel in Frontiers in Behavioural Neuroscience (2020).
GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines.Borchert, Florian; Lohr, Christina; Modersohn, Luise; Langer, Thomas; Follmann, Markus; Sachs, Jan Philipp; Hahn, Udo; Schapranow, Matthieu-P. (2020). 38--48.
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents, clinical guidelines do not contain any patient-related information and can therefore be used without data protection restrictions. Moreover, GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield and provides a variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other corpora, medical and non-medical ones.
Latente Tuberkulose bei medizinischem Personal in Deutschland nach Auslandseinsatz.Meier, I; Schablon, A; Nienhaus, A; Konigorski, S in Pneumologie (2020). 74 1-7.
Outcomes of Patients on Maintenance Dialysis Hospitalized with COVID-19.Chan, Lili; Jaladanki, Suraj K.; Somani, Sulaiman; Paranjpe, Ishan; Kumar, Arvind; Zhao, Shan; Kaufman, Lewis; Leisman, Staci; Sharma, Shuchita; He, John Cijiang; Murphy, Barbara; Fayad, Zahi A.; Levin, Matthew A.; Bottinger, Erwin P.; Charney, Alexander W.; Glicksberg, Benjamin S.; Coca, Steven G.; Nadkarni, Girish N. in Clinical Journal of the American Society of Nephrology (2020). CJN.12360720.
Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation.Vaid, Akhil; Somani, Sulaiman; Russak, Adam J; Freitas, Jessica K De; Chaudhry, Fayzan F; Paranjpe, Ishan; Johnson, Kipp W; Lee, Samuel J; Miotto, Riccardo; Richter, Felix; Zhao, Shan; Beckmann, Noam D; Naik, Nidhi; Kia, Arash; Timsina, Prem; Lala, Anuradha; Paranjpe, Manish; Golden, Eddye; Danieletto, Matteo; Singh, Manbir; Meyer, Dara; OtextquotesingleReilly, Paul F; Huckins, Laura; Kovatch, Patricia; Finkelstein, Joseph; Freeman, Robert M.; Argulian, Edgar; Kasarskis, Andrew; Percha, Bethany; Aberg, Judith A; Bagiella, Emilia; Horowitz, Carol R; Murphy, Barbara; Nestler, Eric J; Schadt, Eric E; Cho, Judy H; Cordon-Cardo, Carlos; Fuster, Valentin; Charney, Dennis S; Reich, David L; Bottinger, Erwin P; Levin, Matthew A; Narula, Jagat; Fayad, Zahi A; Just, Allan C; Charney, Alexander W; Nadkarni, Girish N; Glicksberg, Benjamin S in Journal of Medical Internet Research (2020). 22(11) e24018.
Using Interpretability Approaches to Update textquotedblleftBlack-Boxtextquotedblright Clinical Prediction Models: an External Validation Study in Nephrology.da Cruz, Harry Freitas; Pfahringer, Boris; Martensen, Tom; Schneider, Frederic; Meyer, Alexander; Bottinger, Erwin; Schapranow, Matthieu-P. in Artificial Intelligence in Medicine (2020). 101982.
The SARS-CoV-2 effective reproduction rate has a high correlation with a contact index derived from large-scale individual location data using GPS-enabled mobile phones in GermanyRüdiger, S; Konigorski, S; Edelman, J; Zernick, D; Lippert, C; Thieme, A (2020).
Characterization of Patients Who Return to Hospital Following Discharge from Hospitalization for COVID-19.Somani, Sulaiman S.; Richter, and Felix; Fuster, Valentin; Freitas, Jessica K. De; Naik, Nidhi; Sigel, Keith; Bottinger, Erwin P; Levin, Matthew A.; Fayad, Zahi; Just, Allan C.; Charney, Alexander W.; Zhao, Shan; Glicksberg, Benjamin S.; Lala, Anuradha; Nadkarni, Girish N. in Journal of General Internal Medicine (2020). 35(10) 2838--2844.
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world(1). Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6\% have a frequency of less than 1\%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97\%) had at least one carrier with a LOF variant, and most genes (more than 69\%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Utilization of Deep Learning for Subphenotype Identification in Sepsis-Associated Acute Kidney Injury.Chaudhary, Kumardeep; Vaid, Akhil; Duffy, Áine; Paranjpe, Ishan; Jaladanki, Suraj; Paranjpe, Manish; Johnson, Kipp; Gokhale, Avantee; Pattharanitima, Pattharawin; Chauhan, Kinsuk; O'Hagan, Ross; Vleck, Tielman Van; Coca, Steven G.; Cooper, Richard; Glicksberg, Benjamin; Bottinger, Erwin P.; Chan, Lili; Nadkarni, Girish N. in Clinical Journal of the American Society of Nephrology (2020). CJN.09330819.
"Herr Doktor, verstehen Sie mich?“: Wie lernende Systeme helfen medizinische Fachsprache zu verstehen und welche Rolle klinische Leitlinien dabei spielen.Borchert, Florian; Lohr, Christina; Modersohn, Luise; Hahn, Udo; Langer, Thomas; Wenzel, Gregor; Follmann, Markus; Schapranow, Matthieu-P. in gesundhyte.de: Das Magazin für Digitale Gesundheit in Deutschland (2020). 13 19--22.
Federated learning has the potential to make machine learning applicable to highly privacy-sensitive domains and distributed datasets. In some scenarios, however, a central server for aggregating the partial learning results is not available. In fully decentralized learning, a network of peer-to-peer nodes collaborates to form a consensus on a global model without a trusted aggregating party. Often, the network consists of Internet of Things (IoT) and Edge computing nodes.Previous approaches for decentralized learning map the gradient batching and averaging algorithm from traditional federated learning to blockchain architectures. In an open network of participating nodes, the threat of adversarial nodes introducing poisoned models into the network increases compared to a federated learning scenario which is controlled by a single authority. Hence, the decentralized architecture must additionally include a machine learning-aware fault tolerance mechanism to address the increased attack surface.We propose a tangle architecture for decentralized learning, where the validity of model updates is checked as part of the basic consensus. We provide an experimental evaluation of the proposed architecture, showing that it performs well in both model convergence and model poisoning protection.
A Machine Learning Approach for Non-Invasive Diagnosis of Metabolic Syndrome.Datta, Suparno; Schraplau, Anne; da Cruz, Harry Freitas; Sachs, Jan Philipp; Mayer, Frank; Böttinger, Erwin (2019). 933--940.
A multi-site study on walkability, data sharing and privacy perception using mobile sensing data gathered from the mk-sense platform.Hernández, N; Arnrich, Bert; Favela, J; Ersoy, C; Demiray, Burcu; Fontecha, J in Journal of Ambient Intelligence and Humanized Computing (2019). 10 2199-2211.
SVD Square-root Iterated Extended Kalman Filter for Modeling of Epileptic Seizure Count Time Series with External Inputs.Moontaha, Sidratul; Galka, Andreas; Siniatchkin, Michael; Scharlach, Sascha; von Spiczak, Sarah; Stephani, Ulrich; May, Theodor; Meurer, Thomas (2019). (Vol. 41) 616-619.
In this paper a nonlinear filtering algorithm for count time series is developed that takes the non-negativity of the data into account and preserves positive definiteness of the covariance matrices of the model. For this purpose, a recently proposed variant of Kalman Filtering based on Singular Value Decomposition is incorporated into Iterative Extended Kalman Filtering, in order to estimate the states of a nonlinear state space model. The resulting algorithm is applied to the evaluation and design of therapies for patients suffering from Myoclonic Astatic Epilepsy, employing time series of daily seizure rate. The analysis provides a decision whether for a specific patient a particular anti-epileptic drug is increasing or reducing the seizure rate. Through a simulation study the proposed algorithm is validated. Additionally, for clinical data results obtained by the proposed algorithm are compared with the results from a Cox-Stuart trend test as well as with the visual assessment of experienced pediatric epileptologists.
Bewertung von Therapieeffekten bei Epilepsie: Eine vergleichende Analyse zwischen Cox-Stuart-Berechnung und Zustandsraum-ModellierungScharlach, Sascha; Moontaha, Sirdatul; von Spiczak, Sarah; Stephani, Ulrich; Siniatchkin, Michael; May, Theodor; Galka, Andreas; Meurer, Thomas (2019).
From face to face: the contribution of facial mimicry to cognitive and emotional empathy.Drimalla, Hanna; Landwehr, Niels; Hess, Ursula; Dziobek, Isabel in Cognition and Emotion (2019). 33(8) 1672-1686.
Association of dietary intake of milk and dairy products with blood concentrations of insulin-like growth factor 1 (IGF-1) in Bavarian adults.Romo Ventura, E; Konigorski, S; Rohrmann, S; Schneider, H; Stalla, GK; Pischon, T; Linseisen, J; Nimptsch, K in European Journal of Nutrition (2019). 59 1413–1420.
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry(1-3). In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific(4-10). Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations(11,12). Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions(13)-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
A Federated In-memory Database System for Life Sciences.Schapranow, Matthieu-P.; others, and in Real-Time Business Intelligence and Analytics. BIRTE 2015, BIRTE 2016, BIRTE 2017, M. Castellanos, Chrysanthis, P., Pelechrinis, K. (eds.) (2019). (Vol. 337)
Knowledge Distillation from Machine Learning Models for Prediction of Hemodialysis Outcomes.Freitas da Cruz, Harry; Horschig, Siegfried; Nusshag, Christian; Schapranow, Matthieu-P. in International Journal On Advances in Life Sciences (2019). 11(1-2) 33-43.
In order to compensate severe impairments of renal function, artificial, extracorporeal devices, so called dialyzers, have been developed to enable renal replacement therapy. The parameters utilized in this form of therapy and the specific patient characteristics substantially affect individual patient outcomes and overall disease progression. In this paper, we present a clinical prediction model for outcomes of critically ill patients that underwent a specific form of renal replacement, hemodialysis. For this purpose, we employed two categories of machine learning models: interpretable (Bayesian rule lists and logistic regression) and non-interpretable (multilayer perceptron and random forest). To provide more transparency to the latter category, we applied mimic learning and feature importance metrics. Results show that non-interpretable models outperform the rule-based classifier (c-statistic ≥ 0.9). Despite this result, the use of interpretability methods enables more thorough model scrutiny by a medical experts, revealing possible model biases, which might have been otherwise disregarded.
MORPHER – A Platform to Support Modeling of Outcome and Risk Prediction in Health Research.Freitas da Cruz, Harry; Bergner, Benjamin; Konak, Orhan; Schneider, Frederic; Bode, Philipp; Lempert, Conrad; Schapranow, Matthieu-P. (2019).
Imitation und Erkennung von Emotionen bei Autismus-Spektrum-Störungen - eine computerbasierte Analyse des fazialen Emotionsausdrucks.Drimalla, Hanna; Baskow, Irina; Roepke, Stefan; Behnia, Behnoush; Dziobek, Isabel in 12. Wissenschaftliche Tagung Autismus-Spektrum (2019).
Prediction of Acute Kidney Injury in Cardiac Surgery Patients: Interpretation using Local Interpretable Model-agnostic Explanations.Freitas da Cruz, Harry; Schneider, Frederic; Schapranow, Matthieu-P. (2019). (Vol. 5) 380-387.
External Validation of a “Black-Box” Clinical Predictive Model in Nephrology: Can Interpretability Methods Help Illuminate Performance Differences?Freitas da Cruz, Harry; Pfahringer, Boris; Schneider, Frederic; Meyer, Alexander; Schapranow, Matthieu-P. (2019). 191-201.
In the light of growing data volumes and continuing digitization in fields such as Industry 4.0 or Internet of Things, data stream processing have gained popularity and importance. Especially enterprises can benefit from this development by augmenting their vital, core business data with up-to-date streaming information. Enriching this transactional data with detailed information from high-frequency data streams allows answering new analytical questions as well as improving current analyses, e.g., regarding predictive maintenance. Comparing such data stream processing architectures for use in an enterprise context, i.e., when combining streaming and business data, is currently a challenging task as there is no suitable benchmark.
DEAME-Differential Expression Analysis Made Easy.Kraus, Milena; Hesse, Guenter; Slosarek, Tamara; Danner, Marius; Kesar, Ajay; Bhushan, Akshay; Schapranow, Matthieu-P in Heterogeneous Data Management, Polystores, and Analytics for Healthcare (2018). 162--174.
Analysis of the effects of medication for the treatment of epilepsy by ensemble Iterative Extended Kalman Filtering.Moontaha, Sidratul; Galka, Andreas; Meurer, Thomas; Siniatchkin, Michael (2018). (Vol. 40) 187-190.
This paper proposes an objective methodology for the analysis of epileptic seizure count time series by developing a non-linear state space model. An iterative extended Kalman filter (IEKF) is employed for the estimation of the states of the non-linear state space model. In order to improve convergence of the IEKF, the recently proposed Levenberg-Marquardt variant of the IEKF is explored. As external inputs time-dependent dosages of several simultaneously administered anticonvulsants are included. The aim of the analysis is to decide whether each anticonvulsant decreases or increases the number of seizures per day. The performance of the analysis is analyzed for simulated data, as well as for real data from a patient suffering from myoclonic-astatic epilepsy.
Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer's disease.Konigorski, Stefan; Khorasani, Shahryar; Lippert, Christoph (2018).
Olelo: a web application for intuitive exploration of biomedical literature.Kraus, Milena; Niedermeier, Julian; Jankrift, Marcel; Tietboehl, Soeren; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana in Nucleic acids research (2017).
Die digitale Transformation mitgestalten — Der Datenspendeausweis: Souveräner Umgang mit persönlichen Gesundheitsdaten.Schapranow, Matthieu-P. in Plattform Life Sciences, (H. Garbs, ed.) (2017). (1) 38--39.
A web-based information system for a regional public mental healthcare service network in Brazil.Yoshiura, Vinicius Tohoru; Azevedo-Marques, Joao Mazzoncini; Rzewuska, Magdalena; Vinci, Andre Luiz Teixeira; Sasso, Ariane Morassi; Miyoshi, Newton Shydeo Brandao; Furegato, Antonia Regina Ferreira; Rijo, Rui Pedro Charters Lopes; Del-Ben, Cristina Marta; Alves, Domingos in International journal of mental health systems (2017). 11(1) 1.
Towards An Integrated Health Research Process: A Cloud-based Approach.Schapranow, Matthieu-P.; Uflacker, Matthias; Sariyar, Murat; Semler, Sebastian; Fichte, Johannes; Schielke, Dietmar; Ekinci, Kismet; Zahn, Thomas in Proceedings of The IEEE International Conference on Big Data (2016). 2813--2818.
Today, health research and health care generate a steadily increasing amount of data. Making these available for secondary use cases is essential for efficiency gains in health research, e.g. by reducing time-and costs-intensive acquisition of data. In this contribution, we introduce our SAHRA software platform enabling reproducible research, e.g. by combining multiple data sources, performing data de-identification, and content filtering. We define an innovative research process combining retrospective and prospective research for the first time. Thus, authorized users, e.g. clinical researchers, are able to gain access through our system relevant research data and to perform interactive analyses. As a result, existing sensitive health data is securely transformed into de-identified research data, which can be used to improve future health research.
Datenspendeausweis für Bürger: Ein Plädoyer für mündige Patienten, die die eigenen Gesundheitsdaten am besten verstehen.Schapranow, Matthieu-P. in Management & Krankenhaus (2016). (9)
Recruitment of participants for clinical trials is a complex task involving screening of hundreds of thousands of candidates, e.g., testing for trial-specific inclusion and exclusion criteria. Today, a significant amount of time is spent on manual screening as improper selected candidates have impact on the overall study results. We introduce a candidate eligibility metric, which allows systematic ranking and classification of candidates based on trial-specific filter criteria in an automatic way. It is implemented as part of our web application, which enables real-time analysis of patient data and assessment of candidates. Thus, the time for identification of eligible candidates is tremendously reduced whilst additional degrees of freedom for assessing the relevance of individual candidates are available.
Surveillance and Outbreak Response Management System (SORMAS) to support the control of the Ebola virus disease outbreak in West Africa.Fähnrich, Cindy; Denecke, Kerstin; Adeoye, Olawunmi; Benzler, Justus; Claus, Hermann; Kirchner, Göran; Mall, Sabine; Richter, Ralph; Schapranow, Matthieu-P.; Schwarz, Norbert G.; Tom-Aba, Daniel; Uflacker, Matthias; Poggensee, Gabriele; Krause, Gerard in Euro Surveillance (2015).
The Medical Knowledge Cockpit: Real-time Analysis of Big Medical Data Enabling Precision Medicine.Schapranow, Matthieu-P.; Kraus, Milena; Perscheid, Cindy; Bock, Cornelius; Liedtke, Franz; Plattner, Hasso (2015). 770-775.
In-Memory Computing Enabling Real-time Genome Data Analysis.Haeger, Franziska; Schapranow, Matthieu-P.; Fähnrich, Cindy; Ziegler, Emanuel; Plattner, Hasso in International Journal on Advances in Life Sciences, Vol 6, Nr 1-2 (2014).
In-Memory Technology Enables History-Based Access Control for RFID-Aided Supply Chains.Schapranow, Matthieu-P.; Plattner, Hasso in The Secure Information Society: Ethical, Legal and Political Challenges, pp. 187-213 (2013).
Blitzschnelle Datenanalysen für die personalisierte Medizin der Zukunft – Interdisziplinäre Impulse aus Potsdam und Berlin.Plattner, Hasso; Meinel, Christoph; Schapranow, Matthieu-P. in Themenbroschüre 2012 Gesundheitsstandort Berlin-Brandenburg (2012).
Secure RFID-Enablement in Modern Companies: A Case Study of the Pharmaceutical Industry.Müller, Jürgen; Schapranow, Matthieu-P.; Zeier, Alexander; Plattner, Hasso in Handbook of Research on Industrial Informatics and Manufacturing Intelligence: Innovations and Solutions, pp. 507-539, IGI Global (2012).
Discovery Services in the EPC Network.Schapranow, Matthieu-P.; Zeier, Alexander; Plattner, Hasso; Müller, Jürgen; Lorenz, Martin in Designing and Deploying RFID Applications, pp. 109-130, INTECH Press (2011).
Costs of Authentic Pharmaceuticals: Research on Qualitative and Quantitative Aspects of Enabling Anti-counterfeiting in RFID-aided Supply Chains.Zeier, Alexander; Plattner, Hasso; Schapranow, Matthieu-P.; Müller, Jürgen in Personal and Ubiquitous Computing, Volume 16, Issue 3 (2011). 271-289.
Assessment of Communication Protocols in the EPC Network: Replacing Textual SOAP and XML with Binary Google Protocol Buffers Encoding.Schapranow, Matthieu-P.; Geller, Felix; Lorenz, Martin; Müller, Jürgen; Kowark, Thomas; Zeier, Alexander (2010).
Integration of RFID Technology is a Key Enabler for Demand-Driven Supply Network.Schapranow, Matthieu-P.; Müller, Jürgen; Krüger, Jens; Hofmann, Paul; Zeier, Alexander in The IUP Journal of Supply Chain Management, Volume 6, Nos. 3 & 4, pp. 57-74 (2009).
RFID Middleware as a Service - Enabling Small and Medium-sized Enterprises to Participate in the EPC Network.Müller, Jürgen; Schapranow, Matthieu-P.; Helmich, Marco; Enderlein, Sebastian; Zeier, Alexander (2009).
noFilis CrossTalk 2.0 as Device Management Solution, Experiences while Integrating RFID Hardware into SAP Auto-ID Infrastructure.Zeier, Alexander; Schapranow, Matthieu-P.; Krüger, Jens; Uflacker, Matthias; Müller, Jürgen (2009).
Hasso Plattner Institute for Digital Health at Mount Sinai (HPI·MS)
In March 2019, the Hasso Plattner Institute for Digital Health at Mount Sinai ( HPI·MS ) was formed as the result of a cooperation agreement between the Mount Sinai Health System (MSHS) in New York City and the Hasso Plattner Institute (HPI). www.hpims.org/
Prof. Dr. med. Erwin Böttinger Professor for Digital Health - Personalized Medicine and Head of Digital Health Center erwin.boettinger(at)hpi.de www.hpi.de/boettinger