Hierarchical and Dynamic-Relational Models for Handwriting Recognition
MetadataShow full item record
The recent growth and ubiquity of mobile touchscreen devices has rejuvenated research in digital (online) handwriting recognition. However, this renewed focus has also led to a shift from classic constrained domains such as bank check and address recognition, to automatic transcription of unconstrained or free-form handwriting. This shift brings on challenges of larger vocabularies and the absence of domain-specific constraints that may guide recognition algorithms. Given the relative lack of external structure, recognition models have to utilize the intrinsic structure of, and the rules that govern writing. We express this centrality of intrinsic structure to unconstrained handwriting in three important domains of digital handwriting - (a) recognition : where we seek to capture intrinsic structure via multiple local models for the task of recognizing (transcribing) handwritten words (b) understanding: where we seek to uncover the underlying structural relationships within writing, as a first step towards creating an interpretable knowledge-base to infer or reason with, and, (c) identification: where we offer a new conceptualization for the different drivers of intrinsic structure and use this conceptualization for writer identification. This dissertation explores each of these three key research questions in three essays. In our first essay, we present an end-to-end, real-time, online, unconstrained handwritten word recognition system that builds upon and extends conditional random fields by (a) incorporating a two-stage design where the output of multiple local classifiers - each tuned to different structural aspects of writing - forms the input to the CRF, and, (b) integrating a lexicon-based, synchronous beam search algorithm into the inference engine of the CRF model. We empirically demonstrate that these extensions (i) allow us to effectively recruit multiple independently learned local experts to provide a global estimate and (ii) enable efficient probabilistic evaluation of multiple segmentation-cum-recognition word hypotheses in parallel, thereby yielding a system with recognition performance better than the current state-of-the-art, and which does so in real-time. In our second essay, we motivate a new research problem in handwriting - understanding (as opposed to transcription). By understanding we mean frameworks capable of producing knowledge, i.e., building a coherent set of beliefs or generalizable rules that can be used to infer or reason with. Drawing upon prominent cognitive psychology theories, we lay out the building blocks for such a framework. These building blocks - primitives, relations, and higher order rules - are mapped to an adaptation of the recently introduced Relational Functional Gradient Boosting (RFGB) model. We use first-order-logic representations to characterize handwritten letters in terms of primitives (shape/relational predicates) and use a structure learning framework (RFGB) to uncover the relationship between these primitives. These uncovered relationships encode the structural descriptions of the letter class, i.e., generate a generalizable rule base. Results from experiments show that in addition to performing on par with a well established ANN benchmark, our framework is able to extract interpretable higher-order rules on the basis of how primitives and relations covary in the data. This represents a first step and a proof-of-concept for how structure-interpretable knowledge may be generated in a viable recognition framework. Finally, in our third essay, we examine the problem of identifying writers based on their handwriting. We draw upon theories relating to genetic and memetic (cultural) factors that underlie human handwriting generation to build a new conceptual model of an individual's writing. Extant research has approached writer identification by assuming an individual's handwriting as being equivalent to his/her writing style. In other words, each person's writing style is unique and not shared by others. In contrast, we conceptualize a person's handwriting as an individual-specific combination (determined by a person's physiology - genetic factors) of a shared pool of writing styles (often determined culturally - memetic factors). We model this conceptualization using a three-level hierarchical Bayesian model (Latent Dirichlet Allocation) where each writer's handwriting is modeled as a distribution over finite writing styles, which in turn is modeled as a distribution over text-independent features. We empirically demonstrate that this framework is efficient and scalable while also extending significant parsimony in both model parameters and data requirements. Overall, the core of this dissertation lies in harnessing the implicit structure of handwriting to develop analytically efficient and theoretically grounded models that advance key research areas in digital handwriting.