Forecasting Thresholds Alarms in Medical Patient Monitors using Time Series Models. Chromik., Jonas; Pfitzner., Bjarne; Ihde., Nina; Michaelis., Marius; Schmidt., Denise; Klopfenstein., Sophie; Poncette., Akira-Sebastian; Balzer., Felix; Arnrich., Bert (2022). 26–34.
Too many alarms are a persistent problem in today’s intensive care medicine leading to alarm desensitisation and alarm fatigue. This puts patients and staff at risk. We propose a forecasting strategy for threshold alarms in patient monitors in order to replace alarms that are actionable right now with scheduled tasks in an attempt to remove the urgency from the situation. Therefore, we employ both statistical and machine learning mod- els for time series forecasting and apply these models to vital parameter data such as blood pressure, heart rate, and oxygen saturation. The results are promising, although impaired by low and non-constant sampling frequencies of the time series data in use. The combination of a GRU model with medium-resampled data shows the best performance for most types of alarms. However, higher time resolution and constant sampling frequencies are needed in order to meaningfully evaluate our approach.
Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-ray Data. Ziegler, Joceline; Pfitzner, Bjarne; Schulz, Heinrich; Saalbach, Axel; Arnrich, Bert in Sensors, (F. Marulli; L. Verde, eds.) (2022). 22(14)
Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈1,3,6,10. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.94 for ε=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
CLUZH at SIGMORPHON 2022 Shared Tasks on Morpheme Segmentation and Inflection Generation. Wehrli, Silvan; Clematide, Simon; Makarov, Peter (2022).
Elements of a System for Automatic Monitoring of Specific Mental Health Characteristics at Home. Kirsten, Kristina; Arnrich, Bert (2022).
Addressing one’s mental health has never been more important. The incidences of mental diseases, such as depression or anxiety disorders, have drastically increased in recent years. The longer an adequate treatment is delayed, the greater the impact on the severity of the illness which often results in long absences from work. With the development of smart devices and wearables, it is already possible to measure many physiological parameters in everyday life. In addition, monitoring people in their natural environment offers many advantages, e.g. it is not based on retrospective feelings and memories but can measure and reflect the momentary state. This conceptual paper presents an overview of possible elements of a system for automated monitoring of mental health characteristics in the home. We describe examples of typical parameters for various mental disorders and present different systems and methods to measure them. Furthermore, we show how the individual components of a system can be connected to get a holistic view of specific mental health characteristics. Finally, we also discuss challenges and limitations.
Towards Multi-Modal Recordings in Daily Life: A Baseline Assessment of an Experimental Framework Anders, Christoph; Moontaha, Sidratul; Arnrich, Bert in IS (2022). (Vol. H) 27–30. Information Society.
Background: Wearable devices can record physiological signals from humans to enable an objective assessment of their Mental State. In the future, such devices will enable researchers to work on paradigms outside, rather than only inside, of controlled laboratory environments. This transition requires a paradigm shift on how experiments are conducted, and introduces new challenges. Method: Here, an experimental framework for multi-modal baseline assessments is presented. The developed test battery covers stimuli and questionnaire presenters, and multi-modal data can be recorded in parallel, such as Photoplethysmography, Electroencephalography, Acceleration, and Electrodermal Activity data. The multi-modal data is extracted using a single platform, and synchronized using a shake detection tool. A baseline was recorded from eight participants in a controlled environment. Using Leave-One-Out Cross-Validation, the resampling of data, the ideal window size, and the applicability of Deep Learning for Mental Workload Classification were evaluated. In addition, participants were polled on the acceptance of using the wearable devices. Results: The binary classification performance declined by an average of 7.81% when using eye-blink removal, underlining the importance of data synchronization, correct artefact identification, evaluating and developing artefact removal techniques, and investigating on the robustness of the multi-modal setup. Experiments showed that the optimal window size for the acquired data is 30 seconds for Mental Workload classification, with which a Random Forest classifier and an optimized Deep Convolutional Neural Network achieved the best-balanced classification accuracy of 70.27% and 74.16%, respectively. Conclusions: This baseline assessment gives valuable insights on how to prototype stimulus presentation with different wearable devices and suggests future work packages, paving the way for researchers to investigate new paradigm outside of controlled environments.
Quantifying Cognitive Load from Voice using Transformer-Based Models and a Cross-Dataset Evaluation. Hecker, Pascal; Kappattanavar, Arpita M; Schmitt, Maximilian; Moontaha, Sidratul; Wagner, Johannes; Eyben, Florian; Schuller, Björn W; Arnrich, Bert (2022). 337–344.
Towards Multi-Modal Recordings in Daily Life: A Baseline Assessment of an Experimental Frame- work. Moontaha, Sidratul; Anders, Christoph; Arnrich, Bert (2022). 27–30.
Travelers’ information need in automated vehicles-a psychophysiological analysis. Brandebusemeyer, Charlotte; Ihme, Klas; Bosch, Esther (2022). 1–6.
Quantifying Cognitive Load from Voice using Transformer-Based Models and a Cross-Dataset Evaluation. Hecker, Pascal; Kappattanavar, Arpita M.; Schmitt, Maximilian; Moontaha, Sidratul; Wagner, Johannes; Eyben, Florian; Schuller, Björn W.; Arnrich, Bert (2022). 337–344.
Cognitive load is frequently induced in laboratory setups to measure responses to stress, and its impact on voice has been studied in the field of computational paralinguistics. One dataset on this topic was provided in the Computational Paralinguistics Challenge (ComParE) 2014, and therefore offers great comparability. Recently, transformer-based deep learning architectures established a new state-of-the-art and are finding their way gradually into the audio domain. In this context, we investigate the performance of popular transformer architectures in the audio domain on the ComParE 2014 dataset, and the impact of different pre-training and fine-tuning setups on these models. Further, we recorded a small custom dataset, designed to be comparable with the ComParE 2014 one, to assess cross-corpus model generalisability. We find that the transformer models outperform the challenge baseline, the challenge winner, and more recent deep learning approaches. Models based on the ‘large’ architecture perform well on the task at hand, while models based on the ‘base’ architecture perform at chance level. Fine-tuning on related domains (such as ASR or emotion), before fine-tuning on the targets, yields no higher performance compared to models pre-trained only in a self-supervised manner. The generalisability of the models between datasets is more intricate than expected, as seen in an unexpected low performance on the small custom dataset, and we discuss potential ‘hidden’ underlying discrepancies between the datasets. In summary, transformer-based architectures outperform previous attempts to quantify cognitive load from voice. This is promising, in particular for healthcare-related problems in computational paralinguistics applications, since datasets are sparse in that realm.
DPD-fVAE: Synthetic Data Generation Using Federated Variational Autoencoders With Differentially-Private Decoder Pfitzner, Bjarne; Arnrich, Bert (2022).
Federated learning (FL) is getting increased attention for processing sensitive, distributed datasets common to domains such as healthcare. Instead of directly training classification models on these datasets, recent works have considered training data generators capable of synthesising a new dataset which is not protected by any privacy restrictions. Thus, the synthetic data can be made available to anyone, which enables further evaluation of machine learning architectures and research questions off-site. As an additional layer of privacy-preservation, differential privacy can be introduced into the training process. We propose DPD-fVAE, a federated Variational Autoencoder with Differentially-Private Decoder, to synthesise a new, labelled dataset for subsequent machine learning tasks. By synchronising only the decoder component with FL, we can reduce the privacy cost per epoch and thus enable better data generators. In our evaluation on MNIST, Fashion-MNIST and CelebA, we show the benefits of DPD-fVAE and report competitive performance to related work in terms of Fréchet Inception Distance and accuracy of classifiers trained on the synthesised dataset.
SensorHub: Multimodal Sensing in Real-Life Enables Home-Based Studies. Chromik, Jonas; Kirsten, Kristina; Herdick, Arne; Kappattanavar, Arpita Mallikarjuna; Arnrich, Bert in Sensors (2022). 22(1)
Observational studies are an important tool for determining whether the findings from controlled experiments can be transferred into scenarios that are closer to subjects’ real-life circumstances. A rigorous approach to observational studies involves collecting data from different sensors to comprehensively capture the situation of the subject. However, this leads to technical difficulties especially if the sensors are from different manufacturers, as multiple data collection tools have to run simultaneously. We present SensorHub, a system that can collect data from various wearable devices from different manufacturers, such as inertial measurement units, portable electrocardiographs, portable electroencephalographs, portable photoplethysmographs, and sensors for electrodermal activity. Additionally, our tool offers the possibility to include ecological momentary assessments (EMAs) in studies. Hence, SensorHub enables multimodal sensor data collection under real-world conditions and allows direct user feedback to be collected through questionnaires, enabling studies at home. In a first study with 11 participants, we successfully used SensorHub to record multiple signals with different devices and collected additional information with the help of EMAs. In addition, we evaluated SensorHub’s technical capabilities in several trials with up to 21 participants recording simultaneously using multiple sensors with sampling frequencies as high as 1000 Hz. We could show that although there is a theoretical limitation to the transmissible data rate, in practice this limitation is not an issue and data loss is rare. We conclude that with modern communication protocols and with the increasingly powerful smartphones and wearables, a system like our SensorHub establishes an interoperability framework to adequately combine consumer-grade sensing hardware which enables observational studies in real life.
Extracting Alarm Events from the MIMIC-III Clinical Database. Chromik., Jonas; Pfitzner., Bjarne; Ihde., Nina; Michaelis., Marius; Schmidt., Denise; Klopfenstein., Sophie; Poncette., Akira-Sebastian; Balzer., Felix; Arnrich., Bert (2022). 328–335.
Lack of readily available data on ICU alarm events constitutes a major obstacle to alarm fatigue research. There are ICU databases available that aim to give a holistic picture of everything happening at the respective ICU. However, these databases do not contain data on alarm events. We utilise the vital parameters and alarm thresholds recorded in the MIMIC-III database in order to artificially extract alarm events from this database. Prior to that, we uncover, investigate, and mitigate inconsistencies we found in the data. The results of this work are an approach and an algorithm for cleaning the alarm data available in MIMIC-III and extract concrete alarm events from them. The data set generated by this algorithm is investigated in this work and can be used for further research into the problem of alarm fatigue.
Computational Approaches to Alleviate Alarm Fatigue in Intensive Care Medicine: A Systematic Literature Review. Chromik, Jonas; Klopfenstein, Sophie Anne Ines; Pfitzner, Bjarne; Sinno, Zeena-Carola; Arnrich, Bert; Balzer, Felix; Poncette, Akira-Sebastian in Frontiers in Digital Health (2022). 4
Patient monitoring technology has been used to guide therapy and alert staff when a vital sign leaves a predefined range in the intensive care unit (ICU) for decades. However, large amounts of technically false or clinically irrelevant alarms provoke alarm fatigue in staff leading to desensitisation towards critical alarms. With this systematic review, we are following the Preferred Reporting Items for Systematic Reviews (PRISMA) checklist in order to summarise scientific efforts that aimed to develop IT systems to reduce alarm fatigue in ICUs. 69 peer-reviewed publications were included. The majority of publications targeted the avoidance of technically false alarms, while the remainder focused on prediction of patient deterioration or alarm presentation. The investigated alarm types were mostly associated with heart rate or arrhythmia, followed by arterial blood pressure, oxygen saturation, and respiratory rate. Most publications focused on the development of software solutions, some on wearables, smartphones, or headmounted displays for delivering alarms to staff. The most commonly used statistical models were tree-based. In conclusion, we found strong evidence that alarm fatigue can be alleviated by IT-based solutions. However, future efforts should focus more on the avoidance of clinically non-actionable alarms which could be accelerated by improving the data availability.
PERSIST: A Multimodal Dataset for the Prediction of Perceived Exertion during Resistance Training. Albert, Justin Amadeus; Herdick, Arne; Brahms, Clemens Markus; Granacher, Urs; Arnrich, Bert in Data (2022). 8(1)
Measuring and adjusting the training load is essential in resistance training, as training overload can increase the risk of injuries. At the same time, too little load does not deliver the desired training effects. Usually, external load is quantified using objective measurements, such as lifted weight distributed across sets and repetitions per exercise. Internal training load is usually assessed using questionnaires or ratings of perceived exertion (RPE). A standard RPE scale is the Borg scale, which ranges from 6 (no exertion) to 20 (the highest exertion ever experienced). Researchers have investigated predicting RPE for different sports using sensor modalities and machine learning methods, such as Support Vector Regression or Random Forests. This paper presents PERSIST, a novel dataset for predicting PERceived exertion during reSIStance Training. We recorded multiple sensor modalities simultaneously, including inertial measurement units (IMU), electrocardiography (ECG), and motion capture (MoCap). The MoCap data has been synchronized to the IMU and ECG data. We also provide heart rate variability (HRV) parameters obtained from the ECG signal. Our dataset contains data from twelve young and healthy male participants with at least one year of resistance training experience. Subjects performed twelve sets of squats on a Flywheel platform with twelve repetitions per set. After each set, subjects reported their current RPE. We chose the squat exercise as it involves the largest muscle group. This paper demonstrates how to access the dataset. We further present an exploratory data analysis and show how researchers can use IMU and ECG data to predict perceived exertion.
Effects of a Supplement Containing a Cranberry Extract on Recurrent Urinary Tract Infections and Intestinal Microbiota: A Prospective, Uncontrolled Exploratory Study. Jeitler, Michael; Michalsen, Andreas; Schwiertz, Andreas; Kessler, Christian S.; Koppold-Liebscher, Daniela; Grasme, Julia; Kandil, Farid I.; Steckhan, Nico in Journal of Integrative and Complementary Medicine (2022). 28(5) 399–406.
Mental and Behavioural Responses to Bahá’’i Fasting: Looking behind the Scenes of a Religiously Motivated Intermittent Fast Using a Mixed Methods Approach. Ring, Raphaela M.; Eisenmann, Clemens; Kandil, Farid I.; Steckhan, Nico; Demmrich, Sarah; Klatte, Caroline; Kessler, Christian S.; Jeitler, Michael; Boschmann, Michael; Michalsen, Andreas; Blakeslee, Sarah B.; Stöckigt, Barbara; Stritter, Wiebke; Koppold-Liebscher, Daniela A. in Nutrients (2022). 14(5) 1038.
Voice Analysis for Neurological Disorder Recognition - A Systematic Review and Perspective on Emerging Trends. Hecker, Pascal; Steckhan, Nico; Eyben, Florian; Schuller, Björn W.; Arnrich, Bert in Frontiers in Digital Health (2022). 4
Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.
Hand Gesture Recognition in Daily Life as an Additional Tool for Unobtrusive Data Labeling in Medical Studies. Joch, Julia; Kirsten, Kristina; Arnrich, Bert (2022).
For many use cases, such as supervised machine learning, labeled data is needed. However, to collect information for labels in real-life contexts, scientists are confronted with the challenge of gathering labeled data over an extended period. Labeling this data can become problematic, as constant supervision, similar to a laboratory setting, is neither feasible nor desired. Therefore, participants of such studies have to label their data themselves via appropriate apps on a smartphone. Nevertheless, this process can become very obtrusive in daily life and might even influence the results, especially studies regarding emotions. For example, in studies where participants need to indicate their stress levels frequently, labels get missed in situations where it would be inappropriate to take the phone. Consequently, missing these labels presents a significant problem. This paper aims to provide an unobtrusive solution to labeling data in real-world studies. We recorded a dataset consisting of five gestures and data from daily life. Thereby, we provide a set of predefined gestures that can be distinguished from other everyday life activities by using accelerometer and gyroscope sensors of wearable devices on the wrist. The use of predefined hand gestures for labeling data can therefore serve as an additional tool for the labeling process. Two machine learning approaches were compared and achieved promising results with Matthews Correlation Coefficients of up to 0.789 for a Random Forest and up to 0.835 for a Convolutional Neural Network.
Unsupervised Activity Recognition Using Trajectory Heatmaps from Inertial Measurement Unit Data. Konak., Orhan; Wegner., Pit; Albert., Justin; Arnrich., Bert (2022). 304–312.
Have Your Cake and Log it Too: A Pilot Study Leveraging IMU Sensors for Real-time Food Journaling Notifications Kappattanavar, Arpita; Kremser, Marten; Arnrich, Bert (2022). (Vol. 5) 532–541.
Wearable electroencephalography and multi-modal mental state classification: A systematic literature review. Anders, Christoph; Arnrich, Bert in Computers in Biology and Medicine (2022). 150 106088.
Background: Wearable multi-modal time-series classification applications outperform their best uni-modal counterparts and hold great promise. A modality that directly measures electrical correlates from the brain is electroencephalography. Due to varying noise sources, different key brain regions, key frequency bands, and signal characteristics like non-stationarity, techniques for data pre-processing and classification algorithms are task-dependent. Method: Here, a systematic literature review on mental state classification for wearable electroencephalography is presented. Four search terms in different combinations were used for an in-title search. The search was executed on the 29th of June 2022, across Google Scholar, PubMed, IEEEXplore, and ScienceDirect. 76 most relevant publications were set into context as the current state-of-the-art in mental state time-series classification. Results: Pre-processing techniques, features, and time-series classification models were analyzed. Across publications, a window length of one second was mainly chosen for classification and spectral features were utilized the most. The achieved performance per time-series classification model is analyzed, finding linear discriminant analysis, decision trees, and k-nearest neighbors models outperform support-vector machines by a factor of up to 1.5. A historical analysis depicts future trends while under-reported aspects relevant to practical applications are discussed. Conclusions: Five main conclusions are given, covering utilization of available area for electrode placement on the head, most often or scarcely utilized features and time-series classification model architectures, baseline reporting practices, as well as explainability and interpretability of Deep Learning. The importance of a ‘test battery’ assessing the influence of data pre-processing and multi-modality on time-series classification performance is emphasized.