Twin Boosting: improved feature selection and prediction
نویسندگان
چکیده
We propose Twin Boosting which has much better feature selection behavior than boosting, particularly with respect to reducing the number of false positives (falsely selected features). In addition, for cases with a few important effective and many noise features, Twin Boosting also substantially improves the predictive accuracy of boosting. Twin Boosting is as general and generic as boosting. It can be used with general weak learners and in a wide variety of situations, including generalized regression, classification or survival modeling. Furthermore, it is computationally feasible for large problems with potentially many more features than observed samples. Finally, for the special case of orthonormal linear models, we prove equivalence of Twin Boosting to the adaptive Lasso which yields a theoretical basis for some properties of Twin Boosting.
منابع مشابه
An improved boosting based on feature selection for corporate bankruptcy prediction
With the recent financial crisis and European debt crisis, corporate bankruptcy prediction has become an increasingly important issue for financial institutions. Many statistical and intelligent methods have been proposed, however, there is no overall best method has been used in predicting corporate bankruptcy. Recent studies suggest ensemble learning methods may have potential applicability i...
متن کاملMulti-class HingeBoost. Method and application to the classification of cancer types using gene expression data.
BACKGROUND Multi-class molecular cancer classification has great potential clinical implications. Such applications require statistical methods to accurately classify cancer types with a small subset of genes from thousands of genes in the data. OBJECTIVES This paper presents a new functional gradient descent boosting algorithm that directly extends the HingeBoost algorithm from the binary ca...
متن کاملAmazon Employee Access Control System
In this work, based on the history data of 20102011 from Amazon Inc., we build up a system which aims to take place of resource administrators at Amazon. Our analysis shows that the given dataset is highly imbalanced with categorical values. Thus in the preprocessing step, we tried different sampling methods, feature selection as well as one hot encoding to make the data more suitable for predi...
متن کاملRobust twin boosting for feature selection from high-dimensional omics data with label noise
Omics data such as microarray transcriptomic and mass spectrometry proteomic data are typically characterized by high dimensionality and relatively small sample sizes. In order to discover biomarkers for diagnosis and prognosis from omics data, feature selection has become an indispensable step to find a parsimonious set of informative features. However, many previous studies report considerabl...
متن کاملActive relational rule learning in a constrained confidence rated boosting framework
In this dissertation, I investigate the potential of boosting within the framework of relational rule learning. Boosting is a particularly robust and powerful technique to enhance the prediction accuracy of systems that learn from examples. Although boosting has been extensively studied in the last years for propositional learning systems, only little attention has been paid to boosting in rela...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistics and Computing
دوره 20 شماره
صفحات -
تاریخ انتشار 2010