Mutual information-based feature selection and partition design in fuzzy rule-based classifiers from vague data

نویسندگان

  • Luciano Sánchez
  • M. Rosario Suárez
  • José Ramón Villar
  • Inés Couso
چکیده

Algorithms for preprocessing databases with incomplete and imprecise data are seldom studied. For the most part, we lack numerical tools to quantify the mutual information between fuzzy random variables. Therefore, these algorithms (discretization, instance selection, feature selection, etc.) have to use crisp estimations of the interdependency between continuous variables, whose application to vague datasets is arguable. In particular, when we select features for being used in fuzzy rule-based classifiers, we often use a mutual information-based ranking of the relevance of inputs. But, either with crisp or fuzzy data, fuzzy rule-based systems route the input through a fuzzification interface. The fuzzification process may alter this ranking, as the partition of the input data does not need to be optimal. In our opinion, to discover the most important variables for a fuzzy rule-based system, we want to compute themutual information between the fuzzified variables, andwe should not assume that the ranking between the crisp variables is the best one. In this paper we address these problems, and propose an extended definition of the mutual informationbetween two fuzzified continuousvariables.Wealso introduce anumerical algorithm for estimating the mutual information from a sample of vague data. We will show that this estimation can be included in a feature selection algorithm, and also that, in combinationwith a genetic optimization, the same definition can be used to obtain themost informative fuzzy partition for the data. Both applications will be exemplified with the help of some benchmark problems. 2008 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS

This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...

متن کامل

A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...

متن کامل

Structure Identification of Fuzzy Classifers

For complex and high-dimensional problems, data-driven identification of classifiers has to deal with structural issues like the selection of the relevant features and effective initial partition of the input domain. Therefore, the identification of fuzzy classifiers is a challenging topic. Decision-tree (DT) generation algorithms are effective in feature selection and extraction of crisp class...

متن کامل

Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification

Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...

متن کامل

Support Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran

Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Approx. Reasoning

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2008