Privacy Preserving Representation Learning using Deep Neural Networks
Pandey, Rohit Kumar
MetadataShow full item record
Privacy is a growing concern in today's world given the large digital footprint we leave behind on a day-to-day basis. Given the sensitive nature of personal data, there are concerns about it falling into the wrong hands. One largely employed solution is to store data in a protected form that still enables matching against it. For example, in the case of string based passwords, a SHA-512 hash of the string is stored since the strings are expected to match exactly. This is more challenging for applications that require privacy preserving matching of data for which matching is not expected to be exact. Biometric authentication is one such application that lends itself naturally to this problem. Modalities like face, fingerprint, and voice are increasingly replacing passwords as they provide convenience, and often, higher security. The fuzzy nature of biometric data (due to variations in sensors, environmental conditions etc.) makes it difficult to directly employ hash based methods used for string password protection. Several alternative algorithms (fuzzy commitment, fuzzy vault etc.) tackle this problem by enabling data protection with some degree of error tolerance during matching, but generally suffer from limited error correcting capacity and short keys. Algorithms that have attempted hash based security suffer from uniformity issues and low matching accuracy. There has also been work that combines the biometric data with user-specific keys, but performance is often derived from the external key. The goal of this dissertation is to minimize this trade-off between template security and matching accuracy, without unrealistic assumptions. We seek to combine the representation learning ability of deep neural networks with information theoretic data protection techniques to develop algorithms that 1) provide provable hash based template security, 2) achieve security without compromising on matching accuracy, and 3) do not use external keys. We design three algorithms for matching in a protected domain. Local Region Hashing is a hash based template protection algorithm for faces which achieves high template security but suffers from reduced matching accuracy and issues related to the non-uniformity of the representation space. Deep Secure Encoding overcomes the non-uniformity issue by using a deep convolutional neural network to learn a robust mapping of data to maximum entropy binary codes. The algorithm achieves state-of-the-art matching performance with high template security on PIE, Yale and Multi-PIE face databases. Deep Stochastic Hashing addresses the issue of re-training to enroll new users and provides preliminary results for future research in the relatively new field of representation learning for privacy preserving matching. The proposed research combines state-of-the-art matching algorithms in the deep learning community with information theoretic data protection techniques to design algorithms which ensure high standards of privacy protection, with minimal compromise on matching accuracy. This enables wider acceptance of privacy protection techniques by both commercial/government entities, and the end user. Furthermore, the algorithms are not limited to biometric authentication and will encourage research in privacy preserving data storage for a wide set of applications.