Exploiting the gap between human and machine abilities in handwriting recognition for Web security applications
MetadataShow full item record
Automated recognition of unconstrained handwriting continues to be a challenging research task. In contrast to the traditional role of handwriting recognition in applications such as postal automation, bank check reading etc, in this thesis we have explored the use of handwriting recognition for Web security. HIPs ( Human Interactive Proofs ) are automatic reverse Turing tests designed so that virtually all humans can pass the test but state-of-the-art computer programs will fail. HIPs based on machine printed text are now commonly used to defend against bot attacks. We have investigated a new methodology to design efficient HIPs that exploits the gap between the abilities of humans and computers in reading handwritten text images. There are three specific research problems that we have addressed: (1) the design of an algorithm to automatically generate random and infinitely many distinct handwritten HIPs, (2) a method for quantifying the strengths of human reading abilities and the weaknesses of state-of-the-art handwriting recognizers, and (3) identification of the parameters (slant, stroke width, character gaps, etc) which can be controlled so that the HIPs are human readable but not machine readable. We have used a large repository of handwritten word images that current handwriting recognizers cannot read, even when provided with a lexicon. We have simulated handwriting for generating synthetic word images by using a tracing program that follow the contours of real scanned characters and represents them with third order splines, which are then manipulated to generate the variations in writing styles. The idea is to design HIPs that exploit the knowledge of the common source of errors in automated handwriting recognition systems and at the same time take advantage of the salient aspects of human reading. For example, humans can tolerate intermittent breaks in strokes (using the Gestalt law of continuity) but current programs fail when the breaks vary in size or exceed certain thresholds. The simultaneous interplay of several Gestalt laws of perception adds to the challenge of finding the range of parameters that separate human and machine abilities. We have conducted extensive experiments using simulated handwritten word images as well as real images transformed by a suite of controlled parameters which reaffirm the superiority of humans in reading handwritten text, especially under conditions of low image quality, clutter and occlusion. We have also demonstrated empirically that handwritten HIPs are more efficient than those currently used and are a viable option for challenge-response protocols in Web security applications.