Assessing Microdata Disclosure Risk Using the Poisson-inverse Gaussian Distribution
نویسنده
چکیده
An important measure of identification risk associated with the release of microdata or large complex tables is the number or proportion of population units that can be uniquely identified by some set of characterizing attributes which partition the population into subpopulations or cells. Various methods for estimating this quantity based on sample data have been proposed in the literature by means of superpopulation models. In the present paper the Poisson-inverse Gaussian (PiG) distribution is proposed as a possible approach within this context. Disclosure risk measures are discussed and derived under the proposed model as are various methods of estimation. An example on real data is given and the results indicate that the PiG model may be a useful alternative to other models.
منابع مشابه
A CRONYM : Data without Boundaries D
Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...
متن کاملAssessing the Protection Provided by Misclassification-based Disclosure Limitation Methods for Survey Microdata
Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect the confidentiality of respondents. There is a need for valid and practical ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are appli...
متن کاملOn the relation between logarithmic series model and other superpopulation models useful for microdata disclosure risk assessment
Fisher’s logarithmic series model (Fisher et al. (1943)) is a classical model in statistical ecology. In this paper we show that this model is a key model linking three models discussed in Takemura (1997), i.e., Poisson-gamma model (Bethlehem et al. (1990)), Dirichlet-multinomial model (Takemura (1997)), and Ewens model (Ewens (1990)). This connection opens up the possibility of applying existi...
متن کاملIndividual Disclosure Risk Measures Based on Log-Linear Models
Dissemination of microdata files should be constrained to the confidentiality pledge under which a statistical agency collects survey data. To protect the confidentiality of respondents, statistical agencies perform a two-stage statistical disclosure control procedure. In the first stage, with respect to a disclosure scenario, the risk of disclosure of each unit is estimated. After the removal ...
متن کاملAssessing Disclosure Risk for Record Linkage
An intruder seeks to match a microdata file to an external file using a record linkage technique. The identification risk is defined as the probability that a match is correct. The nature of this probability and its estimation is explored. Some connections are made to the literature on disclosure risk based on the notion of population uniqueness.
متن کامل