A Kernel Density Estimation Method to Generate Synthetic Shifted Datasets in Privacy-Preserving Task

Muhammad Syafiq Mohd Pozi
+ and Mohd. Hasbullah Omar
 

School of Computing, Universiti Utara Malaysia, 06010, Sintok, Kedah, Malaysia
{syafiq.pozi, mhomar}@uum.edu.my

   

Abstract

In order to perform comprehensive analytic task, it requires the availability of any particular complete dataset in the first place. However, due to privacy concern, the specific demand on sharing full dataset to third parties is hardly to be fulfilled. New methods using systematically synthetic data generation in order to preserve the data privacy have recently been explored and identified as a suitable approach to address the privacy concern. Throughout this work, a privacy-preserving probability based synthetic data generation framework for supervised based data analytic is proposed. Using a generative model that captures and represents the probability density function of dataset features, a new privacy-preserving synthetic dataset is synthesized, such that, the new dataset is statistically different from the original dataset. Then, we simulate a supervised learning task using two different machine learning classifiers, as a method to compare the utility of original and the new privacy-preserving synthesized dataset. From the experimental results, we found that the proposed synthetic generation model can produces a new privacy-preserving synthesized dataset, that has similar data utility as to the original dataset.

Keywords: Privacy Preservation, Dataset Shift, Data Anonymization, Differential Privacy

 

+: Corresponding author: Muhammad Syafiq Mohd Pozi
School of Computing, Universiti Utara Malaysia, 06010, Sintok, Kedah, Malaysia, Tel: +60-(0)4-928-5217

 

Journal of Internet Services and Information Security (JISIS), 10(4): 70-89, November 2020

DOI: 10.22667/JISIS.2020.11.30.070 [pdf]