Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions
نویسندگان
چکیده
Differential privacy has recently emerged in private statistical data release as one of the strongest privacy guarantees. Most of the existing techniques that generate differentially private histograms or synthetic data only work well for single dimensional or low-dimensional histograms. They become problematic for high dimensional and large domain data due to increased perturbation error and computation complexity. In this paper, we propose DPCopula, a differentially private data synthesization technique using Copula functions for multi-dimensional data. The core of our method is to compute a differentially private copula function from which we can sample synthetic data. Copula functions are used to describe the dependence between multivariate random vectors and allow us to build the multivariate joint distribution using one-dimensional marginal distributions. We present two methods for estimating the parameters of the copula functions with differential privacy: maximum likelihood estimation and Kendall's τ estimation. We present formal proofs for the privacy guarantee as well as the convergence property of our methods. Extensive experiments using both real datasets and synthetic datasets demonstrate that DPCopula generates highly accurate synthetic multi-dimensional data with significantly better utility than state-of-the-art techniques.
منابع مشابه
DPSynthesizer: Differentially Private Data Synthesizer for Privacy Preserving Data Sharing
Differential privacy has recently emerged in private statistical data release as one of the strongest privacy guarantees. Releasing synthetic data that mimic original data with Differential privacy provides a promising way for privacy preserving data sharing and analytics while providing a rigorous privacy guarantee. However, to this date there is no open-source tools that allow users to genera...
متن کاملThe Comparison Between Goodness of Fit Tests for Copula
Copula functions as a model can show the relationship between variables. Appropriate copula function for a specific application is a function that shows the dependency between data in a best way. Goodness of fit tests theoretically are the best way in selection of copula function. Different ways of goodness of fit for copula exist. In this paper we will examine the goodness of fit test...
متن کاملEfficient Lipschitz Extensions for High-Dimensional Graph Statistics and Node Private Degree Distributions
Lipschitz extensions were recently proposed as a tool for designing node differentially private algorithms. However, efficiently computable Lipschitz extensions were known only for 1-dimensional functions (that is, functions that output a single real value). In this paper, we study efficiently computable Lipschitz extensions for multi-dimensional (that is, vector-valued) functions on graphs. We...
متن کاملJoint Risk Analysis of Meteorological Droughts (Case Study of East Iran)
Droughts are extreme phenomena that are described based on the characteristics of continuity in time and according to their spatial effects and can occur in any climatic situation. Recognition and behavior of droughts, which are closely and directly related to water resources management, are of particular importance. The main purpose of this study is to assess the risk of drought using Copula f...
متن کاملSelf-selection models for public and private sector job
We discuss a class of copula-based ordered probit models with endogenous switching. Such models can be useful for the analysis of self-selection in subjective well-being equations in general, and job satisfaction in particular, where assignment of regressors may be endogenous rather than random, resulting from individual maximization of well-being. In an application to public and private sector...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Advances in database technology : proceedings. International Conference on Extending Database Technology
دوره 2014 شماره
صفحات -
تاریخ انتشار 2014