General Discrete-data Modeling Methods for Producing Synthetic Data with Reduced Re-identification Risk that Preserve Analytic Properties
نویسنده
چکیده
General modeling methods for representing and improving the quality of discrete data (Winkler 2003, 2008) extend and connect the editing methods of Fellegi and Holt (1976) and the imputation ideas of Little and Rubin (2002). This paper describes a modeling framework to produce synthetic microdata that better corresponds to external benchmark constraints on certain aggregates (such as margins) and on which certain cell probabilities are bounded both below and above to reduce re-identification risk. Rather than use linear constraints (Meng and Rubin 1993), the modeling methods use convex constraints (Winkler 1990, 1993) in an extended MCECM procedure. Although the produced microdata are not epsilon-private (Dwork 2006, Dwork and Yekhanin 2008), surrogate original microdata would be exceptionally difficult (or impossible) to construct using the standard lp programming procedures of epsilon-privacy.
منابع مشابه
Analytically Valid Discrete Microdata Files and Re-identification
Loglinear modeling methods have become quite straightforward to apply to discrete data X. A good-fitting loglinear model can be used to generate synthetic copies of X1, ..., Xn of X that preserve analytic properties but may allow reidentification of small cells. With fitting algorithms that use more general convex constraints and are designed to deal with missing data, we are able to disperse t...
متن کاملMasking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems
This paper provides an overview of methods of masking microdata so that the data can be placed in public-use files. It divides the methods according to whether they have been demonstrated to provide analytic properties or not. For those methods that have been shown to provide one or two sets of analytic properties in the masked data, we indicate where the data may have limitations for most anal...
متن کاملGeneral Methods and Algorithms for Modeling and Imputing Discrete Data under a Variety of Constraints
Loglinear modeling methods have become quite straightforward to apply to discrete data X. The models for missing data involve minor extensions of hot-deck methods (Little and Rubin 2002). Edits are structural zeros that forbid certain patterns. Winkler (2003) provided the theory for connecting edit with imputation. In this paper, we give methods and algorithms for modeling/edit/imputation under...
متن کاملTime-Varying Modeling of Systematic Risk: using High-Frequency Characterization of Tehran Stock Exchange
We decompose time-varying beta for stock into beta for continuous systematic risk and beta for discontinuous systematic risk. Brownian motion is assumed as nature of price movements in our modeling. Our empirical research is based on high-frequency data for stocks from Tehran Stock Exchange. Our market portfolio experiences 136 days out of 243 trading days with jumps which is a considerable rat...
متن کاملGlobal Stabilization of Attitude Dynamics: SDRE-based Control Laws
The State-Dependant Riccati Equation method has been frequently used to design suboptimal controllers applied to nonlinear dynamic systems. Different methods for local stability analysis of SDRE controlled systems of order greater than two such as the attitude dynamics of a general rigid body have been extended in literature; however, it is still difficult to show global stability properties of...
متن کامل