Sequential classification using deep learning
MetadataShow full item record
Sequential classification is the task of labeling a sequence of inputs. It is important in various domains, such as natural language processing, handwriting recognition and speech recognition. Previous approaches have considered the following design issues in either way or another: (1) objective function and optimization, e.g. adding prior constraints to the maximum likelihood function for overfitting problems; (2) feature design or learning, for example conventionally hand-designed features are extracted instead of automatically learned over data; and (3) computational complexity, which should be considered since classification of items is explosive with sequential data length. Recent advances in deep learning have attracted great attention in representation learning because they give mappings which can capture meaningful structure information in code space. In our work, we extend deep learning methods to address these design issues. Deep learning has shown significant improvement on multi-class problems, because its output space is limited. However, the space of sequential labeling grows explosively with the length of sequential observation. We explore conditional random fields (CRFs) with deep feature learning for sequential labeling. On the one hand, the Markov property of statistical dependence of labels can be leveraged to improve labeling accuracy. On the other hand, we can learn better representations with multi-layer's nonlinear mappings. We present a mixture objective function, and an effective online learning method to update model parameters. We demonstrate the effectiveness of our approaches over four datasets (OCR, Penn Treebank, FAQ and CB513). Correlations between sequential observations can be exploited in classification tasks. We consider how to leverage a taxonomy of labels to improve classification performance. More specifically, we propose a hierarchical prior over labels to leverage the context information, and present an objective function which includes both maximum likelihood and prior restriction over model parameters. We test our method on 4 datasets (MNIST, news 20, CIFAR and OCR) and show our advantages over competitive baselines. Another issue in data presentation is that the original data might be corrupted with noise or superimposed marks, which will impact the recognition accuracy. Considering that structural noise is hard to remove, we present a deep denoising autoencoder for denoising in a supervised manner. We also propose a joint k-fan deep model for multi-input and multi-output tasks.