Bagging with Asymmetric Costs for Misclassified and Correctly Classified Examples
نویسندگان
چکیده
Diversity is a key characteristic to obtain advantages of combining predictors. In this paper, we propose a modification of bagging to explicitly trade off diversity and individual accuracy. The procedure consists in dividing the bootstrap replicates obtained at each iteration of the algorithm in two subsets: one consisting of the examples misclassified by the ensemble obtained at the previous iteration, and the other consisting of the examples correctly recognized. A high individual accuracy of a new classifier on the first subset increases diversity, measured as the value of the Q statistic between the new classifier and the existing classifier ensemble. A high accuracy on the second subset on the other hand, decreases diversity. We tradeoff between both components of the individual accuracy using a parameter λ ∈ [0, 1] that changes the cost of a misclassification on the second subset. Experiments are provided using well-known classification problems obtained from UCI. Results are also compared with boosting and bagging.
منابع مشابه
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
We consider the two related problems of detecting if an example is misclassified or out-of-distribution. We present a simple baseline that utilizes probabilities from softmax distributions. Correctly classified examples tend to have greater maximum softmax probabilities than erroneously classified and out-of-distribution examples, allowing for their detection. We assess performance by defining ...
متن کاملA Study Of Bagging And Boosting Approaches To Develop Meta - Classifier
-Classification is one of the data mining techniques that analyses a given data set and induces a model for each class based on their features present in the data. Bagging and boosting are heuristic approaches to develop classification models. These techniques generate a diverse ensemble of classifiers by manipulating the training data given to a base learning algorithm. They are very successfu...
متن کاملCS 195 - 5 : Machine Learning Problem Set 6
For this problem we assume that the set of training examples {(xi, yi)} are drawn from two classes such that yi = ±1. For such two-class classifcation problems, the form of yihm(xi) is particularly simple; if an example is correctly classified, then yihm(xi) = 1. If an example is misclassified, then yihm(xi) = −1. As a result, Equation 3 can be decomposed as Zm = W (m−1) + e −αm + W (m−1) − e α...
متن کاملEvaluating IPMN and pancreatic carcinoma utilizing quantitative histopathology
Intraductal papillary mucinous neoplasms (IPMN) are pancreatic lesions with uncertain biologic behavior. This study sought objective, accurate prediction tools, through the use of quantitative histopathological signatures of nuclear images, for classifying lesions as chronic pancreatitis (CP), IPMN, or pancreatic carcinoma (PC). Forty-four pancreatic resection patients were retrospectively iden...
متن کاملA Comparison of Ensemble Creation Techniques
We experimentally evaluate bagging and six other randomization-based approaches to creating an ensemble of decision-tree classifiers. Bagging uses randomization to create multiple training sets. Other approaches, such as Randomized C4.5 apply randomization in selecting a test at a given node of a tree. Then there are approaches, such as random forests and random subspaces, that apply randomizat...
متن کامل