Learning on private data with homomorphic encryption and differential privacy
MetadataShow full item record
Today, the growing concern of privacy issues poses a challenge to the study of sensitive data. In this thesis, we address the learning of private data in two practical scenarios. 1) It is very commonly seen that the same type of data are distributed among multiple parties, and each party has a local portion of the data. For these parties, the learning based only on their own portions of data may lead to small sample problem and generate unsatisfying results. On the other hand, privacy concerns prevent them from exchanging their data and subsequently learning global results from the union of data. In this scenario, we solve the problem with the homomorphic encryption model. Homomorphic encryption enables calculations in the cipher space, which means that some particular operations of data can be conducted even when the data are encrypted. With this technique, we design the privacy preserving solutions for four popular data analysis methods on distributed data, including the Marginal Fisher Analysis (MFA) for dimensionality reduction and classification, the Kruskal-Wallis (KW) statistical test for comparing the distributions of samples, the Markov model for sequence classification and the calculation of Fisher criterion score for informative gene selection. Our solutions allow different parties to perform the algorithms on the union of their data without revealing each party's private information. 2) The other scenario is that, the data holder wants to release some knowledge learned from the sensitive dataset without violating the privacy of individuals participated in the dataset. Although there is no need of direct data exchange in this scenario, publishing the knowledge learned from the data still exposes the participants' private information. Here we adopt the rigorous differential privacy model to protect the individuals' privacy. Specifically, if an algorithm is differentially private, the presence or absence of a data instance in the training dataset would not make much change to the output of the algorithm. In this way, from the released output of the algorithm people cannot gain much information about the individuals participated in the training dataset, and thus the individual privacy is protected. In this scenario, we develop differentially private One Class SVM (1-SVM) models for anomaly detection with theoretical proofs of the privacy and utility. The learned differentially private 1-SVM models can be released for others to perform anomaly detection without violating the privacy of individuals who participated in the training dataset.