Sequential pattern classification without explicit feature extraction
MetadataShow full item record
Feature selection, representation and extraction are integral to statistical pattern recognition systems. Usually features are represented as vectors that capture expert knowledge of measurable discriminative properties of the classes to be distinguished. The feature selection process entails manual expert involvement and repeated experiments. Automatic feature selection is necessary when (i) expert knowledge is unavailable, (ii) distinguishing features among classes cannot be quantified, or (iii) when a fixed length feature description cannot faithfully reflect all possible variations of the classes as in the case of sequential patterns (e.g. time series data). Automatic feature selection and extraction are also useful when developing pattern recognition systems that are scalable across new sets of classes. For example, an OCR designed with explicit feature selection process for the alphabet of one language usually does not scale to an alphabet of another language. One approach to avoiding explicit feature selection is to use a (dis)similarity representation instead of a feature vector representation. The training set is represented by a similarity matrix and new objects are classified based on their similarity with samples in the training set. A suitable similarity measure can also be used to increase the classification efficiency of traditional classifiers such as Support Vector Machines (SVMs). In this thesis we establish new techniques for sequential pattern recognition without explicit feature extraction for applications where: (i) a robust similarity measure exists to distinguish classes and (ii) the classifier (such as SVM) utilizes a similarity measure for both training and evaluation. We investigate the use of similarity measures for applications such as on-line signature verification and on-line handwriting recognition. Paucity of training samples can render the traditional training methods ineffective as in the case of on-line signatures where the number of training samples is rarely greater than 10. We present a new regression measure ( ER 2 ) that can classify multi-dimensional sequential patterns without the need for training with large number of prototypes. We use ER 2 as a preprocessing filter in cases when sufficient training prototypes are available in order to speedup the SVM evaluation. We demonstrate the efficacy of a two stage recognition system by using Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) in the supervised classification framework of SVM. We present experiments with off-line digit images where the pixels are simply ordered in a predetermined manner to simulate sequential patterns. The Generalized Regression Model (GRM) is described to deal with the unsupervised classification (clustering) of sequential patterns.