A Solution to the Curse of Dimensionality Problem in Pairwise Scoring Techniques
نویسندگان
چکیده
This paper provides a solution to the curse of dimensionality problem in the pairwise scoring techniques that are commonly used in bioinformatics and biometrics applications. It has been recently discovered that stacking the pairwise comparison scores between an unknown patterns and a set of known patterns can result in feature vectors with nice discriminative properties for classification. However, such technique can lead to curse of dimensionality because the vectors size is equal to the training set size. To overcome this problem, this paper shows that the pairwise score matrices possess a symmetric and diagonally dominant property that allows us to select the most relevant features independently by an FDA-like technique. Then, the paper demonstrates the capability of the technique via a protein sequence classification problem. It was found that 10-fold reduction in the number of feature dimensions and recognition time can be achieved with just 4% reduction in recognition accuracy.
منابع مشابه
A Multi Linear Discriminant Analysis Method Using a Subtraction Criteria
Linear dimension reduction has been used in different application such as image processing and pattern recognition. All these data folds the original data to vectors and project them to an small dimensions. But in some applications such we may face with data that are not vectors such as image data. Folding the multidimensional data to vectors causes curse of dimensionality and mixed the differe...
متن کاملFusion of feature selection methods for pairwise scoring SVM
It has been recently discovered that stacking the pairwise comparison scores between an unknown patterns and a set of known patterns can result in feature vectors with nice discriminative properties for classification. However, such technique can be hampered by the curse of dimensionality because the vectors size is equal to the training set size. To overcome this problem, this paper investigat...
متن کاملMetric-Based Shape Retrieval in Large Databases
This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...
متن کاملA Discriminant Analysis for Undersampled Data
One of the inherent problems in pattern recognition is the undersampled data problem, also known as the curse of dimensionality reduction. In this paper a new algorithm called pairwise discriminant analysis (PDA) is proposed for pattern recognition. PDA, like linear discriminant analysis (LDA), performs dimensionality reduction and clustering, without suffering from undersampled data to the sam...
متن کاملBFT: A Relational-based Bit Filtration Technique for Efficient Approximate String Joins in Biological Databases
Joining massive tables in relational databases have received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of pairwise whole genome comparison into an approximate join operation in the wellestablished relational database context. We propose a ...
متن کامل