Similarity Discriminant Analysis
نویسنده
چکیده
This chapter details similarity discriminant analysis (SDA), a new framework for similaritybased classification. The two defining characteristics of the SDA classification framework are similarity-based and generative. The classifiers in this framework are similarity-based, because they classify based on the pairwise similarities of data samples, and they are generative, because they build class-dependent probability models of the similarities between samples. Similarity-based classifiers already exist; classifiers based on generative models already exist. SDA is a new framework for classification comprising classifiers that are both similarity-based and generative. Within the general SDA framework, this chapter describes several families of classifiers: the SDA classifier, the local SDA classifier, and the mixture SDA classifier. The SDA classifier is at the foundation of SDA. It classifies based on the class-conditional generative models of the similarity of the samples to representative class prototypes, or centroids. The SDA framework is introduced, developed, and discussed with the aid of this centroid-based SDA classifier. Then, the centroid-based SDA classifier is generalized beyond class centroids to arbitrary class-descriptive statistics. Other possible statistics are described, illustrating the power and generality of the SDA framework. The local SDA classifier is a local version of the SDA classifier. It builds similarity-based class-conditional generative models within a neighborhood of a test sample to be classified. The local class models are endowed with low bias and retain the powerful quality of interpretability associated with generative probability models. Local SDA is a consistent classifier, in the sense that its error rate converges to the Bayes error rate, which is the best possible error rate attainable by a classifier. The mixture SDA classifier draws from the well-established metric learning mixture model research. It generalizes the single-centroid SDA classifier to a mixture of single-centroid SDA components. The mixture SDA classifier can be trained with an expectationmaximization (EM) algorithm which parallels the standard EM approach for the wellknown Gaussian mixture models. The problem of classifying samples based only on their pairwise similarities may be divided into two sub-problems: measuring the similarity between samples and classifying the samples based on their pairwise similarities. It is beyond the scope of this chapter to discuss exhaustively and in detail various ways to measure similarity and various similarity-based O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
منابع مشابه
Relational discriminant analysis and its large sample size problem
Relational discriminant analysis is based on a similarity matrix of the training set. It is able to construct reliable nonlinear discriminants in infinite dimensional feature spaces based on small training sets. This technique has a large sample size problem as the size of the similarity matrix equals the square of the number of objects in the training set. In this paper we discuss and initiall...
متن کاملAutomatic Classification and Analysis Facility for Similarity Retrieval of Design Objects
Automatic classification and analysis facilities for huge number of design objects, such as textures, help designers to spare much time for creation. Since even to the same photographs, users may subjectively give their own interpretations based on their experience and knowledge, everyone may have his (or her) unique subjective criteria on judging similarity. We developed a tri-contrast paramet...
متن کاملبررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملNear Duplicate Document Detection Using Document-Level Features and Supervised Learning
This paper addresses the problem of Near Duplicate document. Propose a new method to detect near duplicate document from a large collection of document set. This method is classified into three steps. Feature selection, similarity measures and discriminant function. Feature selection performs pre-processing; calculate the weight of each terms and heavily weighted term is selected as a features ...
متن کاملStudy of Similarity Measures with Linear Discriminant Analysis for Face Recognition
Face recognition systems have been in the active research in the area of image processing for quite a long time. Evaluating the face recognition system was carried out with various types of algorithms used for extracting the features, their classification and matching. Similarity measure or distance measure is also an important factor in assessing the quality of a face recognition system. There...
متن کاملView-Invariant Recognition of Action Style Self-Dissimilarity
Self-similarity was recently introduced as a measure of inter-class congruence for classification of actions. Herein, we investigate the dual problem of intra-class dissimilarity for classification of action styles. We introduce self-dissimilarity matrices that discriminate between same actions performed by different subjects regardless of viewing direction and camera parameters. We investigate...
متن کامل