Prof. Dr. Christoph Lippert

Large-Scale Medical Image Analysis

Project description

Medical image data show a high variability with respect to their dimensionality (see Figure 1). For example, X-ray scans have megapixel dimensions and a 2D format. Magnetic Resonance (MR) and Computed Tomography (CT) scans, on the other hand, exhibit a third depth dimension, making them large in size. Microscope slides encountered in pathology can go into the gigapixel range.

Figure 1: Exemplary medical image data and resolutions; Left: Chest X-ray, Center: Brain MR scan, Right: Microscope slide (patch)

A modern approach to medical image analysis involves deep learning algorithms that form hierarchical feature representations of the image input. These high-dimensional features are stored in cache, which leads to excessive memory usage in graphics processing units (GPUs). For that reason, today’s deep learning algorithms focus primarily on low-resolution images. For instance, models trained on the ImageNet dataset are often resized to 224x224 pixels. In order to apply deep learning to large medical images of all sorts, novel methods need to be developed. In this project, we strive to develop methods fulfilling the following requirements:

  • Ability to process only relevant image parts in detail
  • Scale to arbitrarily many relevant image parts
  • Scale to arbitrarily large input images, such as gigapixel microscope slides
  • Training with a single GPU
  • Ability to model object part dependencies


Meeting above requirements enables analysis of the ever-increasing volume of medical image data while reducing compute time, energy and hardware resources. A high-level depiction of our approach is shown in Figure 2.

Figure 2: High-level architecture of our approach.

The original image is first reduced in size and fed to a deep learning model. This allows the model to localize interesting image regions from the potentially blurred low-resolution image while saving memory. The computed locations are then used to extract task-relevant patches from the original, high-resolution image. Intuitively, this corresponds to a person focusing on specific parts of a scene. These patches also pass through the model. Both the representations of the patches and the low-resolution image resulting from the embedding model are finally aggregated and related to each other to output the final prediction.


Are you interested in this project? Contact us!