Variable Selection in Classification Trees Based on Imprecise Probabilities

نویسنده

  • Carolin Strobl
چکیده

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2004), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise probability approach: Due to a systematic finite sample-bias in the estimator of the entropy criterion employed in variable selection, categorical predictor variables with low information content are preferred if they have a high number of categories. Mechanisms involved in variable selection in classification trees based on imprecise probabilities are outlined theoretically as well as by means of simulation studies. Corrected estimators are proposed, which prove to be capable of reducing estimation bias as a source of variable selection bias.

منابع مشابه

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2005), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise ...

متن کامل

MASTER THESIS by Paul Fink Ensemble methods for classification trees under imprecise probabilities

In this master thesis some properties of bags of imprecise classification trees, as introduced in Abellán and Masegosa (2010), are analysed. In the beginning the statistical background of imprecise classification trees is outlined – starting with an overview on measuring uncertainty within the concept of Dempster–Shafer theory is presented, followed by a discussion of its application in a tree–...

متن کامل

Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, whic...

متن کامل

Multinomial Nonparametric Predictive Inference: Selection, Classification and Subcategory Data

In probability and statistics, uncertainty is usually quantified using single-valued probabilities satisfying Kolmogorov’s axioms. Generalisation of classical probability theory leads to various less restrictive representations of uncertainty which are collectively referred to as imprecise probability. Several approaches to statistical inference using imprecise probability have been suggested, ...

متن کامل

Ensembles of decision trees based on imprecise probabilities and uncertainty measures

Please cite this article in press as: J. Abellán, E (2012), http://dx.doi.org/10.1016/j.inffus.2012.0 In this paper, we present an experimental comparison among different strategies for combining decision trees built by means of imprecise probabilities and uncertainty measures. It has been proven that the combination or fusion of the information obtained from several classifiers can improve the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005