Improving a De-identification Algorithm that Achieves both k-Anonymity and Differential Privacy
Nov 23, 2019·
·
0 min read

Tianyu Li
Abstract
Data is playing a more and more important role in our lives. For citizens, data is all around them and data includes one’s confidential information. People want their data safe for privacy concerns. However, for curators, they have to publish datasets for the sake of research or commerce, so there is a problem: how to preprocess the dataset before publishing in a way that can both protect individuals privacy and retain useful information for different uses. To deal with that, k-anonymity and differential privacy provide syntactic and semantic guarantees. The previous work combined k-anonymity with differential privacy and proposed a k-anonymous differential private algorithm: ε-safe lattice anonymization (ε-safe LA). In this project, we have reproduced the work of ε-safe LA with adapted utility function and sensitivity, which corrects the mistakes in his work and guarantee that ε-safe LA satisfies (β,ε,δ)-differential privacy under sampling. We have carried out a series of experiments to explore the influence of parameters in ε-safe LA on the probability of outputting a useful dataset. We have also evaluated ε-safe LA by comparing it to other algorithms in terms of accuracy for learning tasks and running time, finding that ε-safe LA shows a moderate accuracy with a long running time. To improve the efficiency of ε-safe LA, we proposed and evaluated two interpolation methods which can half the running time of the algorithm.
Type
Publication
Master Thesis
