Multiple Imputation for Incomplete Data With Semicontinuous Variables
نویسندگان
چکیده
We consider the application of multiple imputation to data containing not only partially missing categorical and continuous variables, but also partially missing ‘semicontinuous’ variables (variables that take on a single discrete value with positive probability but are otherwise continuously distributed). As an imputation model for data sets of this type, we introduce an extension of the standard general location model proposed by Olkin and Tate; our extension, the blocked general location model, provides a robust and general strategy for handling partially observed semicontinuous variables. In particular, we incorporate a two-level model for the semicontinuous variables into the general location model. The rst level models the probability that the semicontinuous variable takes on its point mass value, and the second level models the distribution of the variable given that it is not at its point mass. In addition, we introduce EM and data augmentation algorithms for the blocked general location model with missing data; these can be used to generate imputations under the proposed model and have been implemented in publicly available software. We illustrate our model and computational methods via a simulation study and an analysis of a survey of Massachusetts Megabucks Lottery winners.
منابع مشابه
Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets
Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...
متن کاملModeling and imputation of semicontinuous survey variables
Semicontinuous variables have a proportion of responses at some fixed value and a continuous distribution among the remaining responses. Variables of this type occur in economic surveys of individuals or establishments (e.g. specific types of income or expenditures) where distributions are frequently characterized by a mixture of zeros and continuously distributed positive numbers. In this pape...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملHandling Incomplete High Dimensional Multivariate Longitudinal Data by Multiple Imputation Using a Longitudinal Factor Analysis Model
1. Introduction Longitudinal data sets often suffer from missing values. Because of the large number of variables in these data sets, even a small rate of missingness on some variables can result in a large number of incomplete cases. Multiple imputation (Rubin, 1996, Rubin and Schenker, 1986) is often used to handle missing data problems. When producing multiple imputations for the missing val...
متن کاملThe Classical Linear Regression Model with one Incomplete Binary Variable
We present three di erent methods based on the conditional mean im putation when binary explanatory variables are incomplete Apart from the single imputation and multiple imputation especially the so called pi imputation is presented as a new procedure Seven procedures are com pared in a simulation experiment when missing data are con ned to one independent binary variable complete case analysi...
متن کامل