Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data
نویسندگان
چکیده
MOTIVATION An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. RESULTS We propose to use the L(1) penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Cox regression gives better predictive performance than the L(2) penalized regression and a few other dimension-reduction based methods. CONCLUSIONS We conclude that the proposed LARS-Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients.
منابع مشابه
Penalized Estimators in Cox Regression Model
The proportional hazard Cox regression models play a key role in analyzing censored survival data. We use penalized methods in high dimensional scenarios to achieve more efficient models. This article reviews the penalized Cox regression for some frequently used penalty functions. Analysis of medical data namely ”mgus2” confirms the penalized Cox regression performs better than the cox regressi...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملHigh-dimensional pseudo-logistic regression and classification with applications to gene expression data
High dimension low sample size data, like the microarray gene expression levels, pose numerous challenges to conventional statistical methods. In the particular case of binary classification, some classification methods, such as the support vector machine (SVM), can efficiently deal with high-dimensional predictors, but lacks the accuracy in estimating the probability of membership of a class. ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملA Regression-Based Differential Expression Detection Algorithm for Microarray Studies with Ultra-Low Sample Size
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 21 13 شماره
صفحات -
تاریخ انتشار 2005