Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
Login
 

Integrating structured transformations into probabilistic generative models

Eshant English

In the Research School since December 2021

Chair Digital Health & Machine Learning

Office: Campus III Building G2, Room G-2.1.32
Email: eshant.english(at)hpi.de

Supervisor: Prof. Dr. Christoph Lippert

 

Introduction

  I investigate different structured transformations and their integration into probabilistic generative models with a high focus on increasing interpretibility, tractable optimisation, and uncertainty quantification. Whilst generative models like Normalising Flows provide efficient sampling and density estimation alongside generative capabilities, they have limited expressiveness due to invertibility constraints. They (and other generative models) can utilise structured transformations to create better visualisation. 

  In my current work we investigated a new architecture for Normalising Flows, a type of generative models, for creating sharper images while being computationally efficient. 

 

Background

Normalising flows [4] are invertible neural networks used for generative modelling. They utilise the change of variable formula to normalise the unknown distribution to a distribution of choice, generally chosen to be multivariate standard Gaussian. The extensive knowledge of the properties of the normalised distribution along with invertibility facilitates Density Estimation and Sampling [1] by using the change of variable formula [1].

Currently, Glow [3] is the state-of-the-art Normalising Flow model for image synthesis. However, it is computationally inefficient and a 32*32 image generation costs approx. 90 Million parameters. In comparison, the state-of-the-art Style-GAN [2] has approx. 30 Million parameters and generates images of  better quality.

 

 

Current Work

In our work, we propose an architecture, which uses only 7.5 Million parameters for 32*32 image generation and generates images of non-inferior quality than the Glow architecture [3].

The core of our approach is to introduce parameter sharing in Normalising Flows' architecture to make it computationally efficient. For image data, parameter sharing is typically done via convolution operations, but we can not utilise them as they restrict invertibility, an important aspect of Normalising Flows. We introduce parameter sharing by using the idea of mixing as done by Ilya Tolstikhin et al [5] for the MLP mixer.

We first cut the image into patches (or bands/stripes). Each patch is then linearly projected into a vector sequence. This way, we get our input in the form of a 2D matrix, shaped as "patches x channels". Then, we define two kind of normalising flows: channel-mixing-flow and patch-mixing-flow. The channel-mixing-flow captures interactions between different channels as they take individual rows of the input matrix and operate on each patch separately. The patch-mixing-flow captures interactions between different patches, i.e, it takes individual columns of the input matrix and operates on each channel separately. These two operations are performed multiple times in an alternative fashion enabling interaction between all the entries of the matrix. As we use the same channel-flow for all the rows in the input matrix and the same patch-flow for all the columns of the input matrix, we ensure parameter sharing.


One layer of our mixer-flow consists of a channel-flow followed by batch normalisation, followed by a patch-flow which is followed by batch-normalisation. The authors of MLP-Mixer [5] call this idea mixing; therefore, we (code-)named (subject to change) our architecture as Mixer-Flow.

Future direction: Our architecture's performance is highly dependent on the patch extraction strategy and we have found empirical evidence that changing the patch-extraction strategy slightly after every layer can help produce sharper image.

 

References

1. L. Dinh, J. Sohl-Dickstein, and S. Bengio. “Density estimation using Real NVP”. In: ArXiv abs/1605.08803 (2017).

2. T. Karras, S. Laine, and T. Aila. “A Style-Based Generator Architecture for Generative Adversarial Networks”. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pages 4396–4405.

3. D. P. Kingma and P. Dhariwal. “Glow: Generative Flow with Invertible 1x1 Convolutions”. In: NeurIPS. 2018.

4. I. Kobyzev, S. Prince, and M. A. Brubaker. “Normalizing Flows: An Introduction and Review of Current Methods”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2021), pages 3964–3979.

5. I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J.Yung, A. Steiner, D. M. Keysers, J. Uszkoreit, M. Luˇci ́c, and A. Dosovitskiy. “MLP-Mixer: An All-MLP Architecture for Vision”. In: NeurIPS 2021 (poster). 2021.

Teaching Activities

1. Teaching Tutorials for the Deep Learning (B.Sc and M.Sc) lecture (summer term 2023)

2. Supervising a Master Thesis (winter term 2023-2024) to create a novel convolutional normalising flow model.

3. Supervising a project for the Advanced Machine Learning seminar (winter term 2023-2024), exploiting diffusion-model-based representation for controlled image generation.

Recent Works

1. Kernelised Normalising Flows (Under review, a pre-print version can be found on the arXiv, here).

2. MixerFlow for Image Modelling (Under review, a pre-print version can be found on the arXiv, here)