A Solution to the Curse of Dimensionality Problem in Pairwise Scoring Techniques

نویسندگان

  • Man-Wai Mak
  • Sun-Yuan Kung
چکیده

This paper provides a solution to the curse of dimensionality problem in the pairwise scoring techniques that are commonly used in bioinformatics and biometrics applications. It has been recently discovered that stacking the pairwise comparison scores between an unknown patterns and a set of known patterns can result in feature vectors with nice discriminative properties for classification. However, such technique can lead to curse of dimensionality because the vectors size is equal to the training set size. To overcome this problem, this paper shows that the pairwise score matrices possess a symmetric and diagonally dominant property that allows us to select the most relevant features independently by an FDA-like technique. Then, the paper demonstrates the capability of the technique via a protein sequence classification problem. It was found that 10-fold reduction in the number of feature dimensions and recognition time can be achieved with just 4% reduction in recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi Linear Discriminant Analysis Method Using a Subtraction Criteria

Linear dimension reduction has been used in different application such as image processing and pattern recognition. All these data folds the original data to vectors and project them to an small dimensions. But in some applications such we may face with data that are not vectors such as image data. Folding the multidimensional data to vectors causes curse of dimensionality and mixed the differe...

متن کامل

Fusion of feature selection methods for pairwise scoring SVM

It has been recently discovered that stacking the pairwise comparison scores between an unknown patterns and a set of known patterns can result in feature vectors with nice discriminative properties for classification. However, such technique can be hampered by the curse of dimensionality because the vectors size is equal to the training set size. To overcome this problem, this paper investigat...

متن کامل

Metric-Based Shape Retrieval in Large Databases

This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...

متن کامل

A Discriminant Analysis for Undersampled Data

One of the inherent problems in pattern recognition is the undersampled data problem, also known as the curse of dimensionality reduction. In this paper a new algorithm called pairwise discriminant analysis (PDA) is proposed for pattern recognition. PDA, like linear discriminant analysis (LDA), performs dimensionality reduction and clustering, without suffering from undersampled data to the sam...

متن کامل

BFT: A Relational-based Bit Filtration Technique for Efficient Approximate String Joins in Biological Databases

Joining massive tables in relational databases have received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of pairwise whole genome comparison into an approximate join operation in the wellestablished relational database context. We propose a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006