Analysis of a Fusion Method for Combining Marginal Classifiers

نویسندگان

  • Mark D. Happel
  • Peter Bock
چکیده

The use of multiple features by a classifier often leads to a reduced probability of error, but the design of an optimal Bayesian classifier for multiple features is dependent on the estimation of multidimensional joint probability density functions and therefore requires a design sample size that, in general, increases exponentially with the number of dimensions. The classification method described in this paper makes decisions by combining the decisions made by multiple Bayesian classifiers using an additional classifier that estimates the joint probability densities of the decision space rather than the joint probability densities of the feature space. A proof is presented for the restricted case of two classes and two features; showing that the method always demonstrates a probability of error that is less than or equal to the probability of error of the marginal classifier with the lowest probability of error. 1. Background Given a set of objects and their corresponding feature vectors Χ = [χ1 χ2 ... χd] T in feature space Π, one of the fundamental problems of pattern classification is to define a function (a classifier) Ψ: Π→∆ that can assign an appropriate class label ωi to any given Χ in the feature space. The assignment itself is called a classification decision δ ∈ ∆, and the set of all possible decisions is the decision space ∆. In a Bayesian classifier, the classification decision is made based on the a posteriori probabilities that the input is a member of a given class given the input. For a given input Χ, the a posteriori probability for class ωi, p(ωi | Χ), can be calculated using Bayes' rule: ( ) ( ) ( ) ( ) ( ) ∑ ω ω ω ω = ω i i i i i i P p P p p Χ Χ Χ (1) The Bayesian decision rule selects the class label which corresponds to the maximum a posteriori probability. The class-conditional probability density function p(Χ | ωi) is often referred to as the likelihood function [5], and the likelihood function weighted by the a priori probability P(ωi) is referred to here as the weighted likelihood. Since the sum of the weighted likelihoods (the denominator in the equation above) is positive and common to all of the a posteriori probabilities, it can be factored out and the comparison made of the weighted likelihoods instead: ( ) ( ) ( ) ( ) j i j j i i ≠ ω ω > ω ω ω = δ all for P p P p such that i X X (2) If the probability of error attained by a Bayesian classifier is unacceptably high for the requirements of a given problem, two or more features can be used simultaneously to form multivariate joint probability density functions. By using two or more features, the multivariate classifier is often able to achieve a significantly better classification performance than a comparable univariate classifier. The Bayesian classifier is optimal in the sense that it has the lowest possible probability of error εβ for a given set of probability density functions [6]. If the classes’ density functions are not known, then they must be estimated from sample data. However, the estimation of multivariate density functions in high-dimensional spaces is nontrivial, and may require an unrealistically large design sample size to attain a sufficiently accurate estimate. This "curse of dimensionality” [1] leads to an interesting paradox: as the number of dimensions increases, the theoretical performance of the Bayesian classifier improves but the practical problems involved in implementing such a classifier also increase, resulting in a decline in the actual classification performance beyond a certain threshold dimensionality [6]. Consequently, for situations in which the optimal Bayesian classifier performance is insufficient for d dimensions, it may not be possible in practice to attain better classification performance using d+1 dimensions, even though the theoretical Bayesian performance should increase. From the preceding discussion, it is apparent that a method for obtaining an improvement in the classification performance for the d-dimensional Bayesian classifier without requiring the estimation of d+1 dimensional density functions would prove useful. It is intuitively appealing to imagine combining several, lowerdimensional Bayesian classifiers in such a way as to provide a lower error rate than any one of them alone can achieve, and perhaps even to approach the error rate attainable with a higher-dimensional classifier. Current strategies for obtaining group decisions can be divided into two broad categories: dynamic classifier selection and classifier fusion [12]. Dynamic classifier selection (DCS) strategies attempt to predict or identify, for a given input, the best decision out of the set of decisions made by the individual classifiers. In contrast, classifier fusion algorithms define a function ξ: ∆→∆ that can be used to calculate a decision based on the simultaneous decisions of all of the individual classifiers. Classifier fusion methods include majority voting [9], weighted majority voting, averaged Bayesian decisions [13], naive Bayesian classifiers [2, 10], Dempster-Shafer approaches [3, 11], and stacking strategies [4]. Stacking strategies differ from other classifier fusion strategies in that the fusion function ξ: ∆→∆ is not defined a priori but is instead learned by a "combining classifier" [4]. The combining classifier Ψ receives as input the classification decisions of m member classifiers Ψi(X) and computes a final classification decision δ: δ = Ψ [Ψ1(X), Ψ2(X), . . . ,Ψm(X)] (3) In this paper, a stacking method is proposed as a means of combining marginal decisions into a single, "pseudo-multivariate" decision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overriding the Experts: A Fusion Method for Combining Marginal Classifiers

The design of an optimal Bayesian classifier for multiple features is dependent on the estimation of multidimensional joint probability density functions and therefore requires a design sample size that increases exponentially with the number of dimensions. A method was developed that combines classification decisions from marginal density functions using an additional classifier. Unlike voting...

متن کامل

Designing Kernel Scheme for Classifiers Fusion

In this paper, we propose a special fusion method for combining ensembles of base classifiers utilizing new neural networks in order to improve overall efficiency of classification. While ensembles are designed such that each classifier is trained independently while the decision fusion is performed as a final procedure, in this method, we would be interested in making the fusion process more a...

متن کامل

Fusion of Different Corneal Parameters to Improve the Diagnosis of Keratoconus

Purpose: To diagnose keratoconus from healthy eyes, as well as suspected keratoconus. Methods: Certain parameters were extracted from Casia, Corvis, and Pentacam HR devices for 3 groups of healthy, with keratoconus, and suspected keratoconus. This study was performed on 340 eyes with keratoconus, 310 normal eyes, and 350 suspected keratoconus. The processing method involved the fusion of featur...

متن کامل

Detecting Surface Waters Using Data Fusion of Optical and Radar Remote Sensing Sensor

Identification and monitoring of surface water using remote sensing have become very important in recent decades due to its importance in human needs and political decisions. Therefore, surface water has been studied using remote sensing systems and Sentinel-1 and Sentinel-2 sensors in this study. In this paper, two data fusion approaches and decision fusion improve the accuracy of surface wate...

متن کامل

Experiments on Individual Classifiers and on Fusion of a Set of Classifiers

In the last decades many classification methods and fusers have been developed. Considerable gains have been achieved in the classification performance by fusing and combining different classifiers. We experiment a new method for ship infrared imagery recognition based on the fusion of individual results in order to obtain a more reliable decision [1]. To optimize the results of every class of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000