Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study

نویسندگان

  • Andrea Marshall
  • Douglas G Altman
  • Roger L Holder
چکیده

BACKGROUND The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. RESULTS CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. CONCLUSIONS Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Simulation Study Comparing Two Methods of Handling Missing Covariate Values when Fitting a Cox Proportional- Hazards Regression Model

Missing covariate values is a common problem in a survival data research. The aim of this study is to compare the use of the multiple imputation (MI) and last observation carried forward (LOCF) methods for handling missing covariate values in the Cox proportional hazards (PH) regression model. The comparisons between the methods are based on simulated data. The missingness mechanism is assumed ...

متن کامل

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

BACKGROUND There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. METHODS Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Mu...

متن کامل

تحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند

Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...

متن کامل

Missing Binary Covariate Data and Imputation in Regression Models

This paper presents a simple way to handle missing values in categorical covariates, namely conditional probability imputation . Properties of this technique are given for various patterns of missing data in regression studies . An example shows its use in the proportional hazards model . The probability imputation technique is furthermore compared with multiple imputation and model-based appro...

متن کامل

Imputing missing covariate values for the Cox model

Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2010