Towards Accurate Histogram Publication under Differential Privacy
نویسندگان
چکیده
Histograms are the workhorse of data mining and analysis. This paper considers the problem of publishing histograms under differential privacy, one of the strongest privacy models. Existing differentially private histogram publication schemes have shown that clustering (or grouping) is a promising idea to improve the accuracy of sanitized histograms. However, none of them fully exploits the benefit of clustering. In this paper, we introduce a new clustering framework. It features a sophisticated evaluation of the trade-off between the approximation error due to clustering and the Laplace error due to Laplace noise injected, which is normally overlooked in prior work. In particular, we propose three clustering strategies with different orders of run-time complexities. We prove the superiority of our approach by theoretical utility comparisons with the competitors. Our extensive experiments over various standard real-life and synthetic datasets confirm that our technique consistently outperforms existing competitors.
منابع مشابه
New Statistical Applications for Differential Privacy
Differential privacy is a relatively recent development in the field of privacy-preserving data mining, which was formulated to give a mathematically rigorous definition of privacy. The concept has spawned a great deal of work regarding the development of algorithms which are privacy-preserving under this definition, and also work which seeks to understand the fundamental limitations of such al...
متن کاملDifference Privacy Histogram Release Based on Isotonic Regression
Data release is likely to result in privacy disclosure, so appropriate privacy protection measures are required for various data release technologies in order to ensure the privacy and safety of information, while differential privacy as a reliable model for privacy protection is extensively researched and applied. This paper presents the histogram data publishing solutions under differential p...
متن کاملEngineering Methods for Differentially Private Histograms: Efficiency Beyond Utility
We focus on the problem of differentially private histogram publication, for range-sum query answering. Specifically, we derive a histogram from a given dataset, such that (i) it satisfies -differential privacy, and (ii) it achieves high utility for queries that request the sum of contiguous histogram bins. Existing schemes are distinguished into two categories: fast but oblivious to utility op...
متن کاملPrivacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees
With the rapid growth of the health data scale, the limited storage and computation resources of wireless body area sensor networks (WBANs) is becoming a barrier to their development. Therefore, outsourcing the encrypted health data to the cloud has been an appealing strategy. However, date aggregation will become difficult. Some recently-proposed schemes try to address this problem. However, t...
متن کاملBoosting the Accuracy of Differentially Private Histograms Through Consistency
We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a postprocessing phase, we compute the consistent input most likely to have produced the noisy output. The...
متن کامل