Enhancing cyber security through the use of synthetic handwritten CAPTCHAs
Thomas, Achint Oommen
MetadataShow full item record
Online services which allow users to contribute content and interact remotely over the internet in some manner are common today. Many of these services, like spam control for blogs and email account sign-up, require that they be accessed only by humans and not machines (automated scripts or bots). One method of differentiating between humans and bots is by using a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). A number of different genres of CAPTCHAs exist (text-based, visual, auditory, and cognitive). Text-based CAPTCHAs are popular because automatic recognition of degraded, noisy, distorted text with background clutter is still a challenging task for machines, but is a task that humans perform with relative ease. However, recently a significant number of printed-text based CAPTCHAs have been successfully attacked by bots, thus rendering the services they protect vulnerable to attack. Thus there is an urgent need for exploring alternate CAPTCHAs and this serves as the prime motivation for our research. We explore three primary tracks of investigation in this work. Firstly, we define a set of sound design principles, based on an exploit-avoid-resist philosophy, which must be adhered to while building secure CAPTCHAs. Secondly, we improve the effectiveness of text-based CAPTCHAs by substituting printed text with handwritten text and then layering on additional cognitive tasks. To this end, we develop a fully-automated framework for synthetic handwriting generation to design handwritten CAPTCHAs that will exploit the differential in handwriting reading proficiency between humans and machines. Prior work in this area has focused on synthesizing handwritten textlines to conform to a particular user’s style. We present techniques for simulating handwriting without being writer-specific. Unlike previous work, this is a fully-automated approach based on extracting principal curves from handwritten characters. These serve as a set of control points to allow character-level distortion. We use novel techniques for character baseline detection and ligature parameterization to construct the textlines. A parameterized sinusoid-based function is used to allow random perturbation of these textlines. Using this framework as a basis, we present handwritten CAPTCHAs that perform better than current text-based CAPTCHAs at distinguishing between humans and machines. We also present a novel handwritten CAPTCHA which exploits the mixed-text segmentation problem to deliver sub-0.01% machine recognition rates for respectable human performance. Thirdly, we present in general terms a new class of CAPTCHA, the interaction-based CAPTCHA, which requires an entity to interact with the challenge to gain access to the solution space. We show how the interaction-based CAPTCHA requires an entity to solve three tasks – interaction, cognition, and recognition – to be able to solve a CAPTCHA challenge. Additionally, we present the 3D shadow CAPTCHA, a specific instance of this new class of CAPTCHAs. The 3D shadow CAPTCHA uses aspects of 3D scene rendering, ray casting, and perspective projection to present unique challenges to machines while remaining intuitive for humans to solve.