sdcMicro: a new flexible R-package for the generation of anonymised microdata: Design issues and new methods

نویسنده

  • Matthias Templ
چکیده

Data protection specialists need flexible software tools for the exploratory use of protection methods to generate high quality confidential data. Microdata protection is widely used and is often the only possible way to provide data to both researchers and users. In this paper we present a methodological and computational framework for the generation of anonymised microdata and give insights to the developed R-package sdcMicro. This package may become the standard software for microdata protection since it is very flexible, easy to use and contains all popular methods, plus some new ones. The package can also be used for comparison of methods and of original versus perturbed data not only by measuring information loss but also by various comparison plots.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WP.31 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT)

Data protection specialists need flexible software tools for the exploratory use of protection methods to generate high quality confidential data. Microdata protection is widely used and is often the only possible way to provide data to both researchers and users. In this paper we present a methodological and computational framework for the generation of anonymised microdata and give insights t...

متن کامل

Statistical Disclosure Control for Microdata Using the R-Package sdcMicro

The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying sta...

متن کامل

A Graphical User Interface for Microdata Protection Which Provides Reproducibility and Interactions: the sdcMicro GUI

The proposed graphical user interface (GUI) for microdata protection serves as an easyto-handle tool for users who want to use the sdcMicro package for statistical disclosure control but are not familiar with the native R command line interface. In addition to that, interactions between objects that result from the anonymization process are provided within this GUI. This allows an automated rec...

متن کامل

Robust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking

Abstract. The aim of this study is to evaluate the risk of re-identification related to distance-based disclosure risk measures for numerical variables. First, we overview different already proposed disclosure risk measures. Unfortunately, all these measures do not account for outliers. We assume that outliers must be protected more than observations near the center of the data cloud. Therefore...

متن کامل

Why Shuffle When You Can Use Robust Statistics for SDC - A Simulation Study

Abstract. The aim of this study was to compare different microdata protection methods for numerical variables under various conditions. Most of the 21 methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network (http://cran.r-project.org). The rest of the methods used can easily applied within other R-packages. Wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009