An XCS-Based Algorithm for Classifying Imbalanced Datasets

نویسندگان

  • Hooman Sanatkar
  • Saman Haratizadeh
چکیده

Imbalanced datasets are datasets with different samples distribution in which the distribution of samples in one class is scientifically more than other class samples. Learning a classification model for such imbalanced data has been shown to be a tricky task. In this paper we will focus on learning classifier systems, and will suggest a new XCS-based approach for learning classification models from imbalanced data sets. The main idea behind the suggested approach is to update the important parameters of the learning method based on the information gathered in each step of learning, in order to provide a fair situation for the minor class, to contribute in building the final model. We have also evaluated our approach by testing it with real-world known imbalanced datasets. The results show that our new algorithm has a high detection rate and a low false positive rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Imbalanced Data with Learning Classifier Systems

This chapter investigates the capabilities of XCS for mining imbalanced datasets. Initial experiments show that, for moderate and high class imbalances, XCS tends to evolve a large proportion of overgeneral classifiers. Theoretical analyses are developed, deriving an imbalance bound up to which XCS should be able to differentiate between accurate and overgeneral classifiers. Some relevant param...

متن کامل

Effects of Distance between Classes and Training Datasets Size to the Performance of XCS: Case of Imbalance Datasets

This paper analyzes the effects of distance between classes and training datasets size to XCS classifier system on imbalanced datasets. Our purpose is to answer the question whether the loss of performance incurred by the classifier faced with class imbalance problems stems from the class imbalance per se or it can be explained in some other ways. The experiments from 250 artificial imbalanced ...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015