Variable Selection in Classification Trees Based on Imprecise Probabilities
نویسنده
چکیده
Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2004), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise probability approach: Due to a systematic finite sample-bias in the estimator of the entropy criterion employed in variable selection, categorical predictor variables with low information content are preferred if they have a high number of categories. Mechanisms involved in variable selection in classification trees based on imprecise probabilities are outlined theoretically as well as by means of simulation studies. Corrected estimators are proposed, which prove to be capable of reducing estimation bias as a source of variable selection bias.
منابع مشابه
Variable Selection Bias in Classification Trees Based on Imprecise Probabilities
Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2005), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise ...
متن کاملMASTER THESIS by Paul Fink Ensemble methods for classification trees under imprecise probabilities
In this master thesis some properties of bags of imprecise classification trees, as introduced in Abellán and Masegosa (2010), are analysed. In the beginning the statistical background of imprecise classification trees is outlined – starting with an overview on measuring uncertainty within the concept of Dempster–Shafer theory is presented, followed by a discussion of its application in a tree–...
متن کاملImproving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy
Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, whic...
متن کاملMultinomial Nonparametric Predictive Inference: Selection, Classification and Subcategory Data
In probability and statistics, uncertainty is usually quantified using single-valued probabilities satisfying Kolmogorov’s axioms. Generalisation of classical probability theory leads to various less restrictive representations of uncertainty which are collectively referred to as imprecise probability. Several approaches to statistical inference using imprecise probability have been suggested, ...
متن کاملEnsembles of decision trees based on imprecise probabilities and uncertainty measures
Please cite this article in press as: J. Abellán, E (2012), http://dx.doi.org/10.1016/j.inffus.2012.0 In this paper, we present an experimental comparison among different strategies for combining decision trees built by means of imprecise probabilities and uncertainty measures. It has been proven that the combination or fusion of the information obtained from several classifiers can improve the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005