Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
  
 

Deep Earth Query - Advances in Remote Sensing Image Characterization and Indexing from Massive Archives

Begüm Demir, TU Berlin

Presenter & Research

Prof. Dr. Begüm Demir is leading the Remote Sensing Image Analysis Group at the faculty of Computer Science at TU Berlin. In her research group they are mainly developing machine learning algorithms for information discovery from big earth observation archives.
These archives offer a huge amount of multispectral images that are acquired by satellites.
The goal of this research is to be able to retreive information about the earths surface like agricultural monitoring, icebreaker events or the retreat of the aral sea. These multispectral images, so images that offer more channels than the light which can be seen with human eyes, offer more information than regular images.

The images are provided by the European Space Agency (ESA) and their Sentinel satellites. Per day the ESA adds 10TB of data to their archive that is accessible for free to everyone.

A recording of the presentation is available on Tele-Task.

Summary

Information Discovery: Query by Example

With big data archives there is one problem: How to discover information? One solution to this is Query by Example, also known as Content Based Image Retrieval. Based on a given example, the system tries to find images in its database that have similar semantic features.
To search for semantic features, the system first has to analyze the query image and all the images in the database and can then compare the features of the query image with the ones from the database images.

These image descriptors can be collected in a lot of ways. The presenter demonstrated different approaches, like an algorithm that uses the histogram of a given image or of a smaller portion of this image.
Another approach is Scale-invariant feature transform (or SIFT), which is an algorithm to detect and describe local features in images.

Query by Example can also be used to query for change information to discover changes over time. Instead of a single image you provide several images that represent a specific semantic change over time. This allows to search for trends like deforestation.

Image Retrieval

After the features of the data have been extracted the images have to be retrieved in some way. For this there are supervised and unsupervised techniques.

Unsupervised methods often rely on handcrafted features where methods are created to reduce a given image to a smaller yet semantically meaningful value. These features have to be developed by humans. An example of a really basic feature would be the number of blue pixels in an image. To get good results, features have to be researched that extract information that distinguishes the data in a way relevant to the application at hand. That is a lengthy and difficult process. The advantage of unsupervised methods is that they dont rely on labeled information.

A basic example for an unsupervised method would be simply matching the descriptors of the archive images and the descriptors of the query image, calculating the euclidean distance and then returning the ones that are the closest to the query image as a result.

Supervised methods rely on labeled training data. That often leads to better results, but narrows them to such applications where good training data is available.

An example for a supervised method is a classifier, that can classify images into either "relevant" or "irrelevant". But it first has to be provided with training data, which would be some images labeled as "irrelevant" and some labeled as "relevant". This classifier could be combined with a deep neural network like a Convolutional Neural Network(CNN) to increase efficiency.

A big problem still remains: The amount of data is simply to huge to perform an exhaustive search for given features, wich would mean comparing the query descriptors to the descriptors of every image in the archive one by one. Specialized hashing methods that index data can be used to overcome that problem. The presenter gave an overview of common approaches from recent research.

The idea of indexing images via hashing is to index every image in a hash table, where each bucket shows semantically similar images. Then for a given image you dont need to check all the archive, but only the relevant bucket and retrieve all its images. The question is now how to represent the content of the data by hashing in a binary code?

One way is through Locality Sensitive Hashing (LSH). The idea behind it is that similar input items are like to be hashed into the same buckets. So unlike conventional hashing methods it tries to maximize collisions.
LSH uses hash functions which can be represented as hyperplanes geometrically. Then depending on whether the data point is on the right or the left side of the hyperplane the binary value(0 or 1) is assigned accordingly. There is one hash function for every bit. Since theres usually a need for very long hashcodes, this method is impractical. The features of satelite images are complex and not usually linearly separable, therefore LSH doesnt give good results if used on its own.

Because of this kernel based hashing algorithms are introduced. They improve LSH by defining it in the kernel space. Even though they overcome the problems related to LSH, they still aim to represent every image with a single binary code. But because of the complex multi-spectral nature of the archived images, this strategy is insufficient.

Multi Code Hashing assigns more than one hashcode to an image to account for the complexity of multispectral images. It represents the image regions with specific binary codes. Each of which is related to different classes present in the image(for example buildings, roads, forrests, lakes). The procedure is similar to the other hash functions, the main difference being, that descriptors of primitives(segments of the image) instead of global descriptors are used. Each descriptor of a primitive is then transformed into a hash code which alongside other seperate hash codes is part of the binary representation of that image. In the end the relevant images are retrieved based on that multi-code.

Deep neural networks can be used for supervised hashing. The main advantage is that they don't need hand crafted features to work. Their weakness is that they need an enormous amount of labeled data to avoid overfitting. One way to circumvent this obstacle is to use publicly available data sets, that are already labeled. The presenter gave the example of using an archive of regular aerial images instead of satelite images to train the network. If the training data is semantically close enough to the actual data that the network then needs to work on, this might provide sufficiently precise results.

BigEarthNet

Big earth net is the largest archive of labeled satellite data available. It was created, in part, by the presenter. It includes about 600'000 images. Having a big, labeled dataset to benchmark against is an important tool for developing algorithms working on satellite images.

Previously only pre-labled, general purpose image databases, like ImageNet, were available. Because these databases include many images that are quite different from satellite images, algorithms trained on these materials perform worse.

The presenter is a coauthor of a paper describing the creation and uses of BigEarthNet. [1]

Other Research

The presenter then gave an overview about other research topics that are relevant for satellite image analysis:

Image Captioning

One topic the research group is working on is automatic image captioning on these huge image archives that face the challenges of big data archives. The goal is to automatically analyse these huge amount of images and describe the semantic content of these. With that you are able to ask the system to semantically describe a given area.

Image Super-Resolution

The goal of this research is to improve the spatial resolution of the images. The spatial resolution in this context refers to the area each pixel covers on the ground. Sentinel-2 images consist of pixels where each pixel covers an area of 10x10m on the earths surface. For that purpose the research group on how deep neural networks can help to achieve that spatial resolution improvement.

Image Classification

With the pixel based classification each pixel of the image should get associated to a land cover mass, like water, trees, roads, and so on. With these resulting land cover maps you can automatically analyse changes on the earths surface.

Change Detection

Satellite images can be used to track and analyze changes in the geography and environment over time. A good example is deforestation. Research in this area tries to find ways to efficiently find these kind of changes is a large dataset and try to correctly label them.

Biophysical Parameter Estimation

Sensing data can be used to guess the biophysical properties of a given area of land. For example: information about the growth status of crops can be extracted from overhead imagery.[1]

Summary

The presenter gave an interesting talk about satellite imaging and how it can be used to automatically analyze the earths surface. The huge datasets that are required to get a widespread coverage bring a lot of challenges that the presenter and her research group are working on to overcome. With BigEarthNet there is a solution that offers prelabeled images that are free to use for everyone who needs that kind of data.

Related Work

[1] G. Sumbul, M. Charfuelan, B. Demir, V. Markl, BigEarthNet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding, IEEE International Conference on Geoscience and Remote Sensing Symposium, Yokohama, Japan, 2019.

[2] P. V. Talreja, S. S. Durbha and A. V. Potnis, "On-Board Biophysical Parameters Estimation Using High Performance Computing," IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, 2018, pp. 5445-5448. doi: 10.1109/IGARSS.2018.8518403