Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models
نویسندگان
چکیده
Abstract Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, Multivariate Imputation by Chained Equations (MICE) is a commonly used multiple imputation method. However, imputations from such an approach are not necessarily drawn from
منابع مشابه
Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملData-driven methods for imputing national-level incidence in global burden of disease studies
OBJECTIVE To develop transparent and reproducible methods for imputing missing data on disease incidence at national-level for the year 2005. METHODS We compared several models for imputing missing country-level incidence rates for two foodborne diseases - congenital toxoplasmosis and aflatoxin-related hepatocellular carcinoma. Missing values were assumed to be missing at random. Predictor va...
متن کاملFitting multilevel multivariate models with missing data in responses and covariates that may include interactions and nonlinear terms
The paper extends existing models for multilevel multivariate data with mixed response types to handle quite general types and patterns of missing data values in a wide range of multilevel generalized linear models. It proposes an efficient Bayesian modelling approach that allows missing values in covariates, including models where there are interactions or other functions of covariates such as...
متن کاملAn Estimation of Missing Values by Modified Mixed Kernels
----In statistical practices, difficulties of missing data are universal. Several techniques are used to handle this dilemma of missing data. They include both old approaches, which require only a small amount of mathematical computations and new approaches, which require additional difficult computations that are ever easier for social work researchers to carry out the statistical programming ...
متن کاملMultiple imputation and other resampling schemes for imputing missing observations
The problem of imputing missing observations under the linear regression model is considered. It is assumed that observations are missing at random and all the observations on the auxiliary or independent variables are available. Estimates of the regression parameters based on singly and multiply imputed values are given. Jackknife as well as bootstrap estimates of the variance of the singly im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 95 شماره
صفحات -
تاریخ انتشار 2016