Towards a Globally Optimal Approach for Learning Deep Unsupervised Models
MetadataShow full item record
In today's age of ubiquitous sensors and the Internet, enormous amount of data is being generated continuously from various sources. This calls for an automatic technique that can discover the internal structure of the data and thereby learn the representations (features) from the data. Deep learning is one such method which has achieved significant success in various domains during the past decade. This success is largely attributed to the idea of greedy layerwise pre-training, which decomposes a deep network into multiple single layers and trains each layer from the bottom to the top in an unsupervised fashion. However, this greedy algorithm has a major drawback in that the higher layers do not have direct knowledge of the original data distribution. Therefore, if the bottom layer fails to preserve all the information regarding the data distribution, it precludes the possibility of learning a globally optimal model. In this thesis we tackle this problem through a joint training scheme. In addition, we also extend this idea further and demonstrate with some applications of computer vision. First, we propose a joint training scheme that makes the adjustment of the lower- and higher-level model together, and thereby alleviating the burden for both models. It enables the higher layers of the model to observe the input and thus fit the distribution more accurately. Our joint training method makes the global optimization possible. It is an effective general method that can be applied to train any multi-layer neural networks from end-to-end in an unsupervised fashion using auto-encoders. A single global reconstruction objective is proposed and jointly optimized, such that the objective for the single autoencoders at each layer acts as a local, layer-level regularizer, and thereby learns accurate data models. We empirically evaluate the performance of this joint training scheme and observe that it not only learns a better data model, but also learns better high level representations, which highlights its potential for unsupervised feature learning. We then turn our attention to a more specific domain - image modeling. We hypothesize that modeling image regularity at a more abstract level is more beneficial for higher level computer vision tasks, and we test this hypothesis on scene parsing tasks. We present an end-to-end framework for outdoor scene region decomposition that is learned on a small set of images, and generalizes well to multiple datasets. We discuss the different aspects of the framework especially a generalized variational inference method with better approximations to the true marginals of a graphical model. It is this inference that allow us to take advantage of higher-level information provided from the Markov random field, and thus gaining higher level knowledge. Experimentally, we evaluated the framework on four widely used scene parsing datasets and demonstrate state-of-the-art performance. Finally, we incorporate the previous two ideas and further extend the model for generative modeling of high resolution images. We present a deep unsupervised image model that can model high resolution images and can also be trained jointly. This model applies the principle from generative stochastic network and variational autoencoders so that the resultant model is a proper probabilistic model and one can easily get samples from it. We evaluate this model quantitatively on several popular datasets that were frequently used for evaluating generative models. In addition, we also show some qualitative performance by generating high resolution samples from the learned model.