Roulette Sampling for Cost-Sensitive Learning

نویسندگان

  • Victor S. Sheng
  • Charles X. Ling
چکیده

In this paper, we propose a new and general preprocessor algorithm, called CSRoulette, which converts any cost-insensitive classification algorithms into cost-sensitive ones. CSRoulette is based on cost proportional roulette sampling technique (called CPRS in short). CSRoulette is closely related to Costing, another cost-sensitive meta-learning algorithm, which is based on rejection sampling. Unlike rejection sampling which produces smaller samples, CPRS can generate different size samples. To further improve its performance, we apply ensemble (bagging) on CPRS; the resulting algorithm is called CSRoulette. Our experiments show that CSRoulette outperforms Costing and other meta-learning methods in most datasets tested. In addition, we investigate the effect of various sample sizes and conclude that reduced sample sizes (as in rejection sampling) cannot be compensated by increasing the number of bagging iterations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Random Forest Algorithm for Prediction of Protein-Protein Interaction

Protein-protein interaction (PPI) is a combining two or more protein because of biochemical events in any living cell. Protein domains are functional and/or structure units in a protein and consequently they are responsible for protein-protein interaction. Many machine-learning approaches with domain-based models for protein interaction prediction and their feasibility are showed. In this study...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

Measuring Accuracy between Ensemble Methods: AdaBoost.NC vs. SMOTE.ENN

The imbalanced class distribution is one of the main issue in data mining. This problem exists in multi class imbalance, when samples containing in one class are greater or lower than that of other classes. Most existing imbalance learning techniques are only designed and tested for two-class scenarios. The new negative correlation learning (NCL) algorithm for classification ensembles, called A...

متن کامل

Roulette-wheel selection via stochastic acceptance

Roulette-wheel selection is a frequently used method in genetic and evolutionary algorithms or in modeling of complex networks. Existing routines select one of N individuals using search algorithms of O(N) or O(logN) complexity. We present a simple roulette-wheel selection algorithm, which typically has O(1) complexity and is based on stochastic acceptance instead of searching. We also discuss ...

متن کامل

Cost-Sensitive Learning by Cost-Proportionate Example Weighting

We propose and evaluate a family of methods for converting classifier learning algorithms and classification theory into cost-sensitive algorithms and theory. The proposed conversion is based on cost-proportionate weighting of the training examples, which can be realized either by feeding the weights to the classification algorithm (as often done in boosting), or by careful subsampling. We give...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007