Generating Synthetic Medical Data
A major use of federated learning is the (virtual) aggregation of distributed datasets in order to have enough data to train deep networks. This could possibly be circumvented by generating synthetic data, which is then not personal data in the sense that it does not fit any particular individual. This dataset can then be used as sole basis for centralised model training, or to augment existing, private datasets. Generative models such as Variational Autoencoders (VAEs)  or Generative Adversarial Nets (GANs)  have shown promise in creating high quality image data from noise. Utilising the federated learning framework, it is possible to train these models without violating medical data privacy.
Our focus so far has been on VAE models, which consist of two sub-networks: The encoder takes some data and transforms it into a latent representation, while the decoder reconstructs the original image from the latent representation. Compared to regular Autoencoders (AEs), VAEs use a Gaussian distribution as the latent representation, meaning the encoder emits a mean and covariance and the decoder receives a sample of the corresponding distribution. This allows the VAE to generate new data, instead of merely being able to reconstruct training samples.
Usually both components are trained and synchonised using federated learning and added differential privacy to ensure that no exact replicas of patient data are synthesised. Depending on the number of clients and the amount of data, however, the addition of differential privacy makes the model a lot weaker and the synthesised data less usable.