Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
نویسندگان
چکیده مقاله:
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared the predictive performance of decision trees, ordinal forest, and L1 penalized continuation ratio regression. Materials and Methods: In the present study, three data sets were used. The B-cell data contained 12,625 gene expression data related to 128 patients with four ordinal levels of response variables. The HCC data related to liver cancer included 1469 genes of 56 patients with three ordinal levels of response variables. The Heart data contained information of five variables in 294 patients undergoing angiography with five ordinal levels of response variables. The performance of the methods was compared based on the same training and test datasets using indicators such as accuracy, gamma, and kappa. Results: For two high-dimensional data sets, the ordinal forest model had a higher predictive ability while for the low-dimensional data set, the L1 penalized continuation ratio model had a better predictive performance. Conclusion: The selection of the best prediction model depends on the data set, and for each data, different methods should be considered to achieve the best model.
منابع مشابه
Modeling Paired Ordinal Response Data
About 25 years ago, McCullagh proposed a method for modeling univariate ordinal responses. After publishing this paper, other statisticians gradually extended his method, such that we are now able to use more complicated but efficient methods to analyze correlated multivariate ordinal data, and model the relationship between these responses and host of covariates. In this paper, we aim to...
متن کاملPenalized Regression with Ordinal Predictors
Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type r...
متن کاملPenalized Ordinal Regression Methods for Predicting Stage of Cancer in High-Dimensional Covariate Spaces
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to pred...
متن کاملA risk ratio comparison of L0 and L1 penalized regression
In the past decade, there has been an explosion of interest in using l1-regularization in place of l0-regularization for feature selection. We present theoretical results showing that while l1-penalized linear regression never outperforms l0-regularization by more than a constant factor, in some cases using an l1 penalty is infinitely worse than using an l0 penalty. We also compare algorithms f...
متن کاملPrediction of Ordinal Classes Using Regression Trees
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of transforming it into a learner for ordinal classification tasks. These algorithm variants are compared on a number of benchmark data sets to verify the relative strengths and we...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 24 شماره 5
صفحات 454- 468
تاریخ انتشار 2021-11
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
کلمات کلیدی برای این مقاله ارائه نشده است
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023