In many distributed computing setups, machine learning datasets can be available in an imbalanced way, such that some machines have access to only a few classes. These imbalances yield poor classification results. In this project, we want to study methods that choose a small fraction of every machine’s local data to send around in the cluster during the training. These fractions should counteract the imbalances. Can the amount of data being sent be smaller than data being sent when performing a random shuffling in the beginning of the training?