Utility-Preserving Differentially Private Data Releases Via Individual Ranking Microaggregation
نویسندگان
چکیده
Being able to release and exploit open data gathered in information systems is crucial for researchers, enterprises and the overall society. Yet, these data must be anonymized before release to protect the privacy of the subjects to whom the records relate. Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as k-anonymity and its extensions. However, it is often disregarded that the utility of differentially private outputs is quite limited, either because of the amount of noise that needs to be added to obtain them or because utility is only preserved for a restricted type and/or a limited number of queries. On the contrary, k-anonymity-like data releases make no assumptions on the uses of the protected data and, thus, do not restrict the number and type of doable analyses. Recently, some authors have proposed mechanisms to offer general-purpose differentially private data releases. This paper extends such works with a specific focus on the preservation of the utility of the protected data. Our proposal builds on microaggregation-based anonymization, which is more flexible and utility-preserving than alternative anonymization methods used in the literature, in order to reduce the amount of noise needed to satisfy differential privacy. In this way, we improve the utility of differentially private data releases. Moreover, the noise reduction we achieve does not depend on the size of the data set, but just on the number of attributes to be protected, which is a more desirable behavior for large data sets. The utility benefits brought by our proposal are empirically evaluated and compared with related works for several data sets and metrics.
منابع مشابه
Differentially Private Local Electricity Markets
Privacy-preserving electricity markets have a key role in steering customers towards participation in local electricity markets by guarantying to protect their sensitive information. Moreover, these markets make it possible to statically release and share the market outputs for social good. This paper aims to design a market for local energy communities by implementing Differential Privacy (DP)...
متن کاملImproving the Utility of Differential Privacy via Univariate Microaggregation
Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as k-anonymity and its extensions. However, it is often disregarded that the utility of differentially private outputs is quite limited, either because of the amount of noise that needs to be added to obtain them or because utility is only preserved for a restricted ty...
متن کاملData Utility in Differential Privacy via Microaggregation-based k-Anonymity”
In addition to the general-purpose SSE-based utility evaluation conducted and discussed in the body of the article, in this appendix we provide evaluation results for a specific data use, namely counting queries. The reason of focusing on this data use is that many related works on differentially-private data publishing aim at preserving the utility for counting queries over protected data [12–...
متن کاملStatistical Disclosure Control for Data Privacy Preservation
With the phenomenal change in a way data are collected, stored and disseminated among various data analyst there is an urgent need of protecting the privacy of data. As when individual data get disseminated among various users, there is a high risk of revelation of sensitive data related to any individual, which may violate various legal and ethical issues. Statistical Disclosure Control (SDC) ...
متن کاملwww.econstor.eu Estimation of a Linear Model under Microaggregation by Individual Ranking
Microaggregation by individual ranking is one of the most commonly applied disclosure control techniques for continuous microdata. The paper studies the effect of microaggregation by individual ranking on the least squares estimation of a multiple linear regression model in continuous variables. It is shown that the naive parameter estimates are asymptotically unbiased. Moreover, the naive leas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Information Fusion
دوره 30 شماره
صفحات -
تاریخ انتشار 2016