Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
نویسندگان
چکیده
BACKGROUND The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. RESULTS CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. CONCLUSIONS Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.
منابع مشابه
A Simulation Study Comparing Two Methods of Handling Missing Covariate Values when Fitting a Cox Proportional- Hazards Regression Model
Missing covariate values is a common problem in a survival data research. The aim of this study is to compare the use of the multiple imputation (MI) and last observation carried forward (LOCF) methods for handling missing covariate values in the Cox proportional hazards (PH) regression model. The comparisons between the methods are based on simulated data. The missingness mechanism is assumed ...
متن کاملComparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
BACKGROUND There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. METHODS Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Mu...
متن کاملتحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند
Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...
متن کاملMissing Binary Covariate Data and Imputation in Regression Models
This paper presents a simple way to handle missing values in categorical covariates, namely conditional probability imputation . Properties of this technique are given for various patterns of missing data in regression studies . An example shows its use in the proportional hazards model . The probability imputation technique is furthermore compared with multiple imputation and model-based appro...
متن کاملImputing missing covariate values for the Cox model
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D ...
متن کامل