Dimensionality Reduction and Microarray data
نویسندگان
چکیده
Microarrays are being currently used for the expression levels of thousands of genes simultaneously. They present new analytical challenges because they have a very high input dimension and a very low sample size. It is highly complex to analyse multi-dimensional data with complex geometry and to identify low-dimensional “principal objects” that relate to the optimal projection while losing the least amount of information. Several methods have been proposed for dimensionality reduction of microarray data. Some of these methods include principal component analysis and principal manifolds. This article presents a comparison study of the performance of the linear principal component analysis and the non linear local tangent space alignment principal manifold methods on such a problem. Two microarray data sets will be used in this study. A classification model will be created using fully dimensional and dimensionality reduced data sets. To measure the amount of information lost with the two dimensionality reduction methods, the level of performance of each of the methods will be measured in terms of level of generalisation obtained by the classification models on previously unseen data sets. These results will be compared with the ones obtained using the fully dimensional data sets.
منابع مشابه
Developing a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression
Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...
متن کاملFeature dimension reduction for microarray data analysis using locally linear embedding
Cancer classification is one major application of microarray data analysis. Due to the ultra high dimensionality nature of microarray data, data dimension reduction has drawn special attention for such type of data analysis. The currently available data dimension reduction methods are either supervised, where data need to be labeled, or computational complex. In this paper, we proposed to use a...
متن کاملA Novel Dimensionality Reduction Technique Based on Independent Component Analysis for Modeling Microarray Gene Expression Data
DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. But one challenge of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usu...
متن کاملEfficient Retrieval Technique for Microarray Gene Expression
The DNA mciroarray gene data is in the expression levels of thousands of genes for a small amount of samples. From the microarray gene data, the process of extracting the required knowledge remains an open challenge. Acquiring knowledge is the intricacy in such types of gene data, though number of researches is arising in order to acquire information from these gene data. In order to retrieve t...
متن کاملClassification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods
Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Pr...
متن کامل