Analytically Valid Discrete Microdata Files and Re-identification

نویسنده

  • William E. Winkler
چکیده

Loglinear modeling methods have become quite straightforward to apply to discrete data X. A good-fitting loglinear model can be used to generate synthetic copies of X1, ..., Xn of X that preserve analytic properties but may allow reidentification of small cells. With fitting algorithms that use more general convex constraints and are designed to deal with missing data, we are able to disperse the counts associated with small cells over other cells in a manner that reduces reidentification risk while still maintaining most analytic properties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Producing Public-use Microdata That Are Analytically Valid and Confidential

A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more recor...

متن کامل

Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata

Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau. A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same...

متن کامل

Masking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems

This paper provides an overview of methods of masking microdata so that the data can be placed in public-use files. It divides the methods according to whether they have been demonstrated to provide analytic properties or not. For those methods that have been shown to provide one or two sets of analytic properties in the masked data, we indicate where the data may have limitations for most anal...

متن کامل

General Discrete-data Modeling Methods for Producing Synthetic Data with Reduced Re-identification Risk that Preserve Analytic Properties

General modeling methods for representing and improving the quality of discrete data (Winkler 2003, 2008) extend and connect the editing methods of Fellegi and Holt (1976) and the imputation ideas of Little and Rubin (2002). This paper describes a modeling framework to produce synthetic microdata that better corresponds to external benchmark constraints on certain aggregates (such as margins) a...

متن کامل

Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk

We present in this paper the first empirical comparison of SDC methods for microdata which encompasses both continuous and categorical microdata. Based on re-identification experiments, we try to optimize the tradeoff between information loss and disclosure risk. First, relevant SDC methods for continuous and categorical microdata are identified. Then generic information loss measures (not targ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007