Outlier Protection in Continuous Microdata Masking

نویسندگان

  • Josep Maria Mateo-Sanz
  • Francesc Sebé
  • Josep Domingo-Ferrer
چکیده

Masking methods protect data sets against disclosure by perturbing the original values before publication. Masking causes some information loss (masked data are not exactly the same as original data) and does not completely suppress the risk of disclosure for the individuals behind the data set. Information loss can be measured by observing the differences between original and masked data while disclosure risk can be measured by means of record linkage and confidentiality intervals. Outliers in the original data set are particularly difficult to protect, as they correspond to extreme inviduals who stand out from the rest. The objective of our work is to compare, for different masking methods, the information loss and disclosure risk related to outliers. In this way, the protection level offered by different masking methods to extreme individuals can be evaluated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulation Study of the Effectiveness of Masking Microdata with Mixtures of Multivariate Normal Distributions

Continuous variables in microdata can be masked for protection from disclosure through the use of an additive noise. I consider adding noise that is distributed according to a mixture of normal distributions. There are several parameters involved in constructing the additive noise. The study’s purpose is to lay down as a guide a recipe for the choices of these parameters. The proportion of reid...

متن کامل

Microdata Protection

Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata against improper disclosure is therefore an issue that has become increasingly important and will co...

متن کامل

Releasing Microdata: Disclosure Risk Estimation, Data Masking and Assessing Utility

Statistical agencies release sample microdata from social surveys under different modes of access ranging from Public Use Files (PUF) in the form of tables or highly perturbed datasets to Microdata Under Contract (MUC) for researchers and licensed institutions where levels of protection are less severe. In addition, statistical agencies often have on-site datalabs where registered researchers c...

متن کامل

An evolutionary approach to enhance data privacy

Dissemination of data with sensitive information about individuals has an implicit risk of unauthorized disclosure. Perturbative masking methods propose the distortion of the original data sets before publication, tackling a difficult tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). In this paper we describe how information loss and d...

متن کامل

A multiplicative masking method for preserving the skewness of the original micro-records

Masking methods for the safe dissemination of microdata consist of distorting the original data while preserving a pre-defined set of statistical properties in the microdata. For continuous variables, available methodologies rely essentially on matrix masking and in particular on adding noise to the original values, using more or less refined procedures depending on the extent of information th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004