Bagging with Asymmetric Costs for Misclassified and Correctly Classified Examples

نویسندگان

  • Ricardo Ñanculef
  • Carlos Valle
  • Héctor Allende
  • Claudio Moraga
چکیده

Diversity is a key characteristic to obtain advantages of combining predictors. In this paper, we propose a modification of bagging to explicitly trade off diversity and individual accuracy. The procedure consists in dividing the bootstrap replicates obtained at each iteration of the algorithm in two subsets: one consisting of the examples misclassified by the ensemble obtained at the previous iteration, and the other consisting of the examples correctly recognized. A high individual accuracy of a new classifier on the first subset increases diversity, measured as the value of the Q statistic between the new classifier and the existing classifier ensemble. A high accuracy on the second subset on the other hand, decreases diversity. We tradeoff between both components of the individual accuracy using a parameter λ ∈ [0, 1] that changes the cost of a misclassification on the second subset. Experiments are provided using well-known classification problems obtained from UCI. Results are also compared with boosting and bagging.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

We consider the two related problems of detecting if an example is misclassified or out-of-distribution. We present a simple baseline that utilizes probabilities from softmax distributions. Correctly classified examples tend to have greater maximum softmax probabilities than erroneously classified and out-of-distribution examples, allowing for their detection. We assess performance by defining ...

متن کامل

A Study Of Bagging And Boosting Approaches To Develop Meta - Classifier

-Classification is one of the data mining techniques that analyses a given data set and induces a model for each class based on their features present in the data. Bagging and boosting are heuristic approaches to develop classification models. These techniques generate a diverse ensemble of classifiers by manipulating the training data given to a base learning algorithm. They are very successfu...

متن کامل

CS 195 - 5 : Machine Learning Problem Set 6

For this problem we assume that the set of training examples {(xi, yi)} are drawn from two classes such that yi = ±1. For such two-class classifcation problems, the form of yihm(xi) is particularly simple; if an example is correctly classified, then yihm(xi) = 1. If an example is misclassified, then yihm(xi) = −1. As a result, Equation 3 can be decomposed as Zm = W (m−1) + e −αm + W (m−1) − e α...

متن کامل

Evaluating IPMN and pancreatic carcinoma utilizing quantitative histopathology

Intraductal papillary mucinous neoplasms (IPMN) are pancreatic lesions with uncertain biologic behavior. This study sought objective, accurate prediction tools, through the use of quantitative histopathological signatures of nuclear images, for classifying lesions as chronic pancreatitis (CP), IPMN, or pancreatic carcinoma (PC). Forty-four pancreatic resection patients were retrospectively iden...

متن کامل

A Comparison of Ensemble Creation Techniques

We experimentally evaluate bagging and six other randomization-based approaches to creating an ensemble of decision-tree classifiers. Bagging uses randomization to create multiple training sets. Other approaches, such as Randomized C4.5 apply randomization in selecting a test at a given node of a tree. Then there are approaches, such as random forests and random subspaces, that apply randomizat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007