Biomarker Extraction from Neuroimaging Data

Shahryar Khorasani

Machine Learning & Digital Health
Hasso Plattner Institute

Office: G-2.1.33
Tel.: +49-(0)331 5509-4874
Email: Shahryar.Khorasani(at)hpi.de
Links: Homepage

Supervisor: Prof. Dr. Christoph Lippert

Overview

Brain stuctures

In this project, I try to find structural features in brain MRI that can help differentiate between healthy subjects
and subjects affected with neurodegenerative diseases. I use convolutional Autoencoder neural networks to transfer
high dimensional features into a lower dimensional space. I apply multi-task learning to increase the information
captured in the lower dimensional space.

Introduction

Biomarkers are essential to disease diagnosis, prognosis and prediction. Developing accurate biomarkers helps narrowing down causal pathologies and treatment plans. In neuroscience research, magnetic resonance imaging (MRI) provides imaging biomarkers that can capture the structural features throughout the brain and within brain regions. These complex geometric features are very informative when represented as images. However, in large-scale analyses, only the thickness, area or volume of these anatomical features are used, potentially introducing an information loss. In this project, we address this through multi-task learning. Using a convolutional autoencoder, we capture the information necessary to generate the brain region. By adding auxiliary tasks such as predicting the structure volume and its class, we attempt to enforce more information into the hidden space. We use this hidden space to create new biomarkers that could potentially carry more information than the conventional neuroimaging structural values.

In a multi-task setting, we define one or more auxiliary tasks to improve the performance on the main task. Multi-task learning uses domain-specific information captured by learning the auxiliary task to improve generalization. In neural networks, multi-task learning involves parameter sharing of the hidden layers. The most common way of parameter sharing is hard parameter sharing in which, taskspecific layers branch from a common stem architecture. For this project, we use this approach as it allows us to enforce constrains on a specific layer in the hidden space that later can be extracted and processed as deep features.

Not all auxiliary tasks will improve the model. Some tasks are cooperative while others are competitive and can reduce performance. Because different tasks have varying levels of difficulty, one needs to adjust the weights of their losses when calculating the multi-objective loss. Adjusting these weights are usually expensive and time-consuming to hyper-tune manually. Therefore, multi-task balancing techniques have been developed to allow the model to learn the proper weights automatically. In this project, we use a balancing technique known as gradient normalization [1].

Data

For model training we use the UK biobank1 data set, which comprises 40,682 subjects with annotated T1 structural MRI scans, structural volumetric measurements, genetic data and clinical reports. We focus on the lateral, the third and the fourth ventricles as they are highly associated with Alzheimer's Disease. For our testing we use the Alzheimer's Disease Neuroimaginf Initative (ADNI) data set comprising 1200 subjects separated into 3 groups: Coginitively Normal, Mild Cognitve Impairment and Alzheimer's Disease. This allows us to evaluate the trasnferablity of the deep features learnt from the healthy subjects from the UK Biobank.

Figure 1: Convolutional Autoecoder with a Regression Auxiliary task to reconstruct the ventricles and predict their volume respectively.

Figure 2: The results of the two sample test for the T1 whole brain MRI scan and Unsupervised approach (top) and ventricle-focused training with multi-task learning approach (bottom). AD: Alzheimer's Disease; MCI: Mild Cognitive Impairment; CN: Cognitively Normal.

Experimental Setup & Preliminary Result

In a previous work we have shown that using a convolutional autoencoder we can extract diagnostically informative deep features from brain structural MRI [2]. By Applying maximum mean discrepancy test to the deep features we were able to differentiate between Alzheimer’s disease, cognitive impairment and cognitively normal subjects, as well as carriers of the Alzheimer’s disease major genetic risk factor, Apolipoprotein epsilon 4 (APOE4).

While in the previouse work we have used T1 brain MRI scans and took an unsupervised approach, in this work, we have used ventricle segmentations as model input and trained convolutional autoencoder with an auxiliary regression task to predict the volume of the ventricles. We used generalized dice loss and mean squared error for reconstruction and regression, respectively. When calculating the total loss we multiplied the reconstruction loss by a factor of 10 to counteract task imbalance. The model was trained and validated on separate sets of 5000 subjects from UK Biobank and evaluated on 1200 subjects from the ADNI data set. We generated the deep feature for these ADNI subjects, calculated their first principle components and compared these features with the previouse features generated from the T1 whole brain MRI scans (Figure 1).

We were able to achieve higher accurcy in the two sample tests in comparison to our previouse experiments (Figure 2).

References

[1] Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich. GradNorm: Gradient
Normalization for Adaptive Loss Balancing in Deep Multitask Networks. 2018. arXiv:
1711.02257 [cs.CV].

[2] M. Kirchler, S. Khorasani, M. Kloft, and C. Lippert. “Two-sample Testing Using
Deep Learning”. In: Proceedings of the Twenty Third International Conference
on Artificial Intelligence and Statistics. Edited by S. Chiappa and R. Calandra.
Volume 108. Proceedings of Machine Learning Research. Online: PMLR, 2020,
pages 1387–1398.

Published Research

Kirchler, M.,Khorasani, S.,Kloft, M., Lippert, C. (2019). "Two-sample Testing Using Deep Learning". In: ariXiv: 1910.06239

Konigorski, S., Khorasani S., & Lippert, C. (2018). Intergrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer’s disease. 32nd Conference on Neural Information Processing Systems (NeurIPS), arXiv:1812.00448.

Masoudi, R., Mazaheri-Asadi, L., & Khorasani, S. (2016). Partial and complete microdeletions of Y chromosome in infertile males from South of Iran. Molecular Biology Research Communications, 5(4), 247–255, PMCID: PMC5326488.