Variable Selection in Classification Trees Based on Imprecise Probabilities

نویسنده

Carolin Strobl

چکیده

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2004), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise probability approach: Due to a systematic finite sample-bias in the estimator of the entropy criterion employed in variable selection, categorical predictor variables with low information content are preferred if they have a high number of categories. Mechanisms involved in variable selection in classification trees based on imprecise probabilities are outlined theoretically as well as by means of simulation studies. Corrected estimators are proposed, which prove to be capable of reducing estimation bias as a source of variable selection bias.

متن کامل

منابع مشابه

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

متن کامل

MASTER THESIS by Paul Fink Ensemble methods for classification trees under imprecise probabilities

In this master thesis some properties of bags of imprecise classification trees, as introduced in Abellán and Masegosa (2010), are analysed. In the beginning the statistical background of imprecise classification trees is outlined – starting with an overview on measuring uncertainty within the concept of Dempster–Shafer theory is presented, followed by a discussion of its application in a tree–...

متن کامل

Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, whic...

متن کامل

Multinomial Nonparametric Predictive Inference: Selection, Classification and Subcategory Data

In probability and statistics, uncertainty is usually quantified using single-valued probabilities satisfying Kolmogorov’s axioms. Generalisation of classical probability theory leads to various less restrictive representations of uncertainty which are collectively referred to as imprecise probability. Several approaches to statistical inference using imprecise probability have been suggested, ...

متن کامل

Ensembles of decision trees based on imprecise probabilities and uncertainty measures

Please cite this article in press as: J. Abellán, E (2012), http://dx.doi.org/10.1016/j.inffus.2012.0 In this paper, we present an experimental comparison among different strategies for combining decision trees built by means of imprecise probabilities and uncertainty measures. It has been proven that the combination or fusion of the information obtained from several classifiers can improve the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Variable Selection in Classification Trees Based on Imprecise Probabilities

نویسنده

چکیده

منابع مشابه

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

MASTER THESIS by Paul Fink Ensemble methods for classification trees under imprecise probabilities

Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

Multinomial Nonparametric Predictive Inference: Selection, Classification and Subcategory Data

Ensembles of decision trees based on imprecise probabilities and uncertainty measures

عنوان ژورنال:

اشتراک گذاری