Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
  
Login
  • de
 

Multi-task learning for extracting biomarkers from neuroimaging data

Shahryar Khorasani

Machine Learning & Digital Health
Hasso Plattner Institute

Office: G-2.1.33
Tel.: +49-(0)331 5509-4874
Email: Shahryar.Khorasani(at)hpi.de
Links: Homepage

Supervisor: Prof. Dr. Christoph Lippert

Motivation

In neuroscience research, high dimensional geometric brain structures are only represented by their thickness, area or volume. We believe that using deep neural networks we can extract biomarkers that can carry more information than the conventional neuroimaging structural values. Convolutional neural networks have been applied to brain structural magnetic resonance imaging (MRI) data, solving tasks such as structural segmentation, diagnostic classification and registration. However, their potential for biomarker extraction is yet to be explored. In a previous work we have shown that using a convolutional autoencoder we can extract diagnostically informative deep features from brain structural MRI. By Applying maximum mean discrepancy test to the deep features we were able to differentiate between Alzheimer’s disease, cognitive impairment and cognitively normal subjects, as well as carriers of the Alzheimer’s disease major genetic risk factor, Apolipoprotein epsilon 4 (APOE4) [2]. However, we are still far from replacing conventional structural phenotypes with deep features in a neuroscience research. We aim to address this by imposing constrains on deep features using a multi-task learning approach.

Multi-task learning

In a multi-task setting, we define one or more auxiliary task(s) to improve the performance on the main task. Multi-task learning uses domain-specific information captured by learning the auxiliary task to improve generalization. In neural networks, multi-task learning involves parameter sharing of the hidden layers. The most common way of parameter sharing is hard parameter sharing in which, task-specific layers branch from a common stem architecture. For this project, we use this approach as it allows us to enforce constrains on a specific layer in the hidden space that later can be extracted and processed as deep features.

Not all auxiliary tasks will improve the model. Some tasks are cooperative while others are competitive and can reduce performance [3]. Because different tasks have varying levels of difficulty, one needs to adjust the weights of their losses when calculating the multi-objective loss. Adjusting these weights are usually expensive and time-consuming to hyper-tune manually. Therefore, multi-task balancing techniques have been developed to allow the model to learn the proper weights automatically and counteract the task imbalance. In this project, we use gradient normalization [1] to adaptively adjust tasks specific weights. 

Data and Experimental Setup

We work with the UK biobank which comprises 500,000 subjects and includes neuroimaging, cognitive, clinical and genetic data. For this project, we limit our analysis to the 40,682 subjects who have annotated T1 structural MRI scans with volumetric measurements (figure 1).

 

Figure 1: Gray matter segmentations is one of the many structural annotations that we use for biomarker extraction.

The main task in our multi-task learning setting is to reconstruct the structural annotations (figure 2). The first auxiliary task is to predict the class of the annotation and the second auxiliary task is to predict the volume of the annotation. Thus, we start by training a convolutional autoencoder and we add one branch of layers to perform classification and another branch to perform regression. We need to compress the data as much as possible while being able to solve all tasks. Therefore, we choose the bottle-neck of the autoencoder as the branching point to enforce our multi-objective constrain. This layer holds the last shared parameters between all tasks and is used for applying gradient normalization.

Figure 2: In this multi-task setting, the encoder compresses the input annotation into a vector in the feature space. The decoder reconstructs the annotation. Task-specific layers in branch 1 predict the class of the annotation in a multi-class classification setting. Task-specific layers in branch 2 predict the volume of the annotation in a regression setting.

To find the right design for our neural network model, we first attempt to solve each task separately. Subsequently, we run the multi-task experiments first using only one auxiliary task and later both. Each experiment is run once with and once without the gradient normalization. Next, we compute the first principle components of the deep features. This is because we need to represent each brain structure with a single variable. In order to evaluate the deep features we focus on phenotypes which are associated with specific brain structures. For example, we know there is a correlation between the age of a subject and the volume of their cerebrospinal fluid. This is because the brain goes under neurodegeneration as the subject ages. So in this scenario, using a linear regression model, we can test the deep features which were extracted from the cerebrospinal fluid annotations against the volume of the cerebrospinal fluid. 

One of the most well-known causes of neurodegeneration is Alzheimer's disease. Using the same deep features and volumetric measurements as before, we can try to predict whether a subject is diagnosed with Alzheimer's disease or not. This will be via a logistic regression model. 

To evaluate our approach in a multi-modal analysis, we perform a genome-wide association study (GWAS). In GWAS, we investigate the correlation between the genetic variations within a population and a phenotype (volume of a brain structure). When a statistically significant number of individuals who carry the same genetic variant have similar phenotypic measurements, that genetic site and phenotype are shown to be associated.

References

[1] Z. Chen, V. Badrinarayanan, C. Lee, and A. Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In ICML, (2018).

[2] M. Kirchler, S. Khorasani, M. Kloft, and C. Lippert. “Two-sample Testing Using Deep Learning”. In: arXiv preprint abs/1910.06239 (2019).

[3] S. Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: arXiv preprint abs/1706.05098 (2017).

Published Research

Kirchler, M.,Khorasani, S.,Kloft, M., Lippert, C. (2019). "Two-sample Testing Using Deep Learning". In: ariXiv: 1910.06239

Konigorski, S., Khorasani S., & Lippert, C. (2018). Intergrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer’s disease. 32nd Conference on Neural Information Processing Systems (NeurIPS), arXiv:1812.00448.

Masoudi, R., Mazaheri-Asadi, L., & Khorasani, S. (2016). Partial and complete microdeletions of Y chromosome in infertile males from South of Iran. Molecular Biology Research Communications, 5(4), 247–255, PMCID: PMC5326488.