The majority of current state-of-the-art approaches try to learn disentangled representations by operating on the latent space priors. In contrast, we investigate the effect of regularizing the model architecture itself, by reducing the number of connections between the hidden units of the networks. Dense connectivity increases the number of parameters in a model, and thus its potential expressive power. However, recent works have shown that in practice a large percentage of weights remains unused and can be removed without sacrificing performance.
We argue that sparsely connected features might be desirable when learning disentangled representations and propose techniques for obtaining such models - Random Masking and Learnable L1 Masking. These are then evaluated on standard datasets for measuring disentanglement, along with ablation tests to identify the factors crucial for improving performance.