An efficient stochastic search for Bayesian variable selection with high-dimensional correlated predictors

نویسندگان

  • Deukwoo Kwon
  • Maria Teresa Landi
  • Marina Vannucci
  • Haleem J. Issaq
  • DaRue Prieto
  • Ruth M. Pfeiffer
چکیده

We present a Bayesian variable selection method for the setting in which the number of independent variables or predictors in a particular dataset is much larger than the available sample size. While most existing methods allow some degree of correlations among predictors but do not consider these correlations for variable selection, our method accounts for correlations among the predictors in variable selection. Our correlation-based stochastic search (CBS) method, the hybrid-CBS algorithm, extends a popular search algorithm for high-dimensional data, the stochastic search variable selection (SSVS) method. Similar to SSVS, we search the space of all possible models using variable addition, deletion or swap moves. However, our moves through the model space are designed to accommodate correlations among the variables. We describe our approach for continuous, binary, ordinal, and count outcome data. The impact of choices of prior distributions and hyper-parameters is assessed in simulation studies. We also examined performance of variable selection and prediction as the correlation structure of the predictors varies. We found that the hybrid-CBS resulted in lower prediction errors and better identified the true outcome associated predictors than SSVS when predictors were moderately to highly correlated. We illustrate the method on data from a proteomic profiling study of melanoma, a skin cancer.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian variable selection in quantile regression

In many applications, interest focuses on assessing relationships between predictors and the quantiles of the distribution of a continuous response. For example, in epidemiology studies, cutoffs to define premature delivery have been based on the 10th percentile of the distribution for gestational age at delivery. Using quantile regression, one can assess how this percentile varies with predict...

متن کامل

Working Paper M09/04 Methodology Two Level Stochastic Search Variable Selection In GLMs With Missing Predictors

Stochastic search variable selection (SSVS) algorithms provide an appealing and widely used approach for searching for good subsets of predictors, while simultaneously estimating posterior model probabilities and model-averaged predictive distributions. This article proposes a two-level generalization of SSVS to account for missing predictors, while accommodating uncertainty in the relationship...

متن کامل

Bayesian Variable Selection in Regression with Networked Predictors

We consider Bayesian variable selection in linear regression when the relationships among a possibly large number of predictors are described by a network given a priori. A class of motivating examples is to predict some clinical outcomes with high-dimensional gene expression profiles and a gene network, for which it is assumed that the genes neighboring to each other in the network are more li...

متن کامل

Forecasting in VAR models with large datasets

This paper deals with model selection and forecasting in vector autoregressions (VARs) in situations where the set of available predictors is inconveniently large to accommodate with methods and diagnostics used in traditional small-scale models. Available information over this large dataset can be summarized into a considerably smaller set of variables through factors estimated by the dynamic ...

متن کامل

Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis.

We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational statistics & data analysis

دوره 55 10  شماره 

صفحات  -

تاریخ انتشار 2011