On the Number of Partial Least Squares Components in Dimension Reduction for Tumor Classification
نویسندگان
چکیده
Dimension reduction is important during the analysis of gene expression microarray data, because the high dimensionality of data sets hurts the generalization performance of classifiers. Partial Least Squares (PLS) based dimension reduction is a frequently used method, since it is specialized in handling high dimensional data set and leads to satisfying classification performance. This paper investigates the influence on generalization performance caused by the variation of the number of PLS components and the relationship between classification performance and regression quality of PLS on training set. Experimental results show that the number of PLS components for classifiers can be automatically determined by regression quality of PLS latent variables.
منابع مشابه
On partial least squares dimension reduction for microarray-based classification: a simulation study
In microarray tumor tissue classi'cation studies, the expressions of thousands of genes (variables) are simultaneously measured across a few tissue samples. Standard statistical methodologies in classi'cation do not work well when the dimension, p, is greater than the sample size, N . One approach to classi'cation problems, when p N , is to 'rst apply a dimension reduction method and then perfo...
متن کاملTumor classification by partial least squares using microarray gene expression data
MOTIVATION One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p(genes) far exceeding the number of samples N. Stand...
متن کاملOrthogonal Projection Weights in Dimension Reduction based on Partial Least Squares
Dimension reduction is important during the analysis of gene expression microarray data, because the high dimensionality in the data set hurts the generalization performance of classifiers. Partial least squares based dimension reduction (PLSDR) is a frequently used method, since it is specialized in handling high dimensional data set and leads to satisfying classification performance. However,...
متن کاملPLS dimension reduction for classification with microarray data.
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification procedure consisting of PLS dimension reduction and linear discriminant analysis on the new components is compared with some of the best state-of-the-art classification methods. Moreover, a boosting al...
متن کاملSparse partial least squares classification for high dimensional data.
Partial least squares (PLS) is a well known dimension reduction method which has been recently adapted for high dimensional classification problems in genome biology. We develop sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS). These sparse versions aim to achieve variable selection and dimension reduction simultaneously. We...
متن کامل