First study of the behaviour of genetic fuzzy classifier based on low quality data respect to the preprocessing of low quality imbalanced datasets

نویسندگان

  • Ana M. Palacios
  • Luciano Sánchez
  • Inés Couso
چکیده

There are real-world dataset where we can found classes with a very different percentage of patterns between them, that is to say we have classes represented by many examples (high percentage of patterns) and classes represented by few examples (low percentage of patterns). These kind of datasets receive the name of “imbalanced datasets”. In the field of classification problems the imbalanced dataset are a focus of study both in preprocessing mechanisms and in classification systems. In this paper we study the behaviour of genetic fuzzy system (GFS) respect to imbalanced datasets, where this GFS is able to support low quality data. We will analyse the different preprocessing mechanisms of imbalanced datasets and will show the necessity of extending theses preprocessing mechanisms a “low quality imbalanced datasets”. In addition, we include a comprehensive description of the new algorithm to able to preprocessing low quality imbalanced datasets. Severals realworld, low quality imbalanced datasets, are used to evaluate the results obtained with the GFS after using the new algorithm proposed in this paper.

منابع مشابه

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

Combining Adaboost with Preprocessing Algorithms for Extracting Fuzzy Rules from Low Quality Data in Possibly Imbalanced Problems

An extension of the Adaboost algorithm for obtaining fuzzy rule-based systems from low quality data is combined with preprocessing algorithms for equalizing imbalanced datasets. With the help of synthetic and real-world problems, it is shown that the performance of the Adaboost algorithm is degraded in presence of a moderate uncertainty in either the input or the output values. It is also estab...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

ارائه‌روش جدید مبتنی‌بر برنامه‌نویسی ژنتیک برای وزن‌دهی قوانین فازی در طبقه‌بندی نامتوازن

In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...

متن کامل

ارائه یک روش فازی-تکاملی برای تشخیص خطاهای نرم‌افزار

Software defects detection is one of the most important challenges of software development and it is the most prohibitive process in software development. The early detection of fault-prone modules helps software project managers to allocate the limited cost, time, and effort of developers for testing the defect-prone modules more intensively.  In this paper, according to the importance of soft...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010