Accurate Cancer Classification using Expressions of Very few Genes
نویسندگان
چکیده
Gene expression profiling by microarray technique has been effectively utilized for classification and diagnostic guessing of cancer nodules. Several machine learning and data mining techniques are presently applied for identifying cancer using gene expression data. Though, these techniques have not been proposed to deal with the particular needs of gene microarray examination. Initially, microarray data is featured by a highdimensional feature space repeatedly surpassing the sample space dimensionality by a factor of 100 or higher. Additionally, microarray data contains a high degree of noise. The majority of the existing techniques do not sufficiently deal with the drawbacks like dimensionality and noise. Gene ranking method is later introduced to overcome those problems. Some of the widely used Gene ranking techniques are T-Score, ANOVA, etc. But those techniques will sometimes wrongly predict the rank when large database is used. To overcome these issues, this paper proposes a technique called Enrichment Score for ranking purpose. The classifier used in the proposed technique is Support Vector Machine (SVM). The experiment is performed on lymphoma data set and the result shows the better accuracy of classification when compared to the conventional method.
منابع مشابه
SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملInvestigation of p53 and p27 expressions in the N-nitroso-N-methylureainduced breast cancer in female Wistar Albino rats
Introduction: N-nitroso-N-methylurea (NMU) is a carcinogen from nitrosamines family, which has been used to induce breast cancer in rodents. This model of breast cancer is very similar to the estrogen dependent breast cancer in human. As a continuation of our recent works, in the present study, the expressions of both p53 and p27 were investigated in NMU-induced breast cancer in Wistar Albin...
متن کاملEfficient Cancer Classification using Fast Adaptive Neuro-Fuzzy Inference System (FANFIS) based on Statistical Techniques
The increase in number of cancer is detected throughout the world. This leads to the requirement of developing a new technique which can detect the occurrence the cancer. This will help in better diagnosis in order to reduce the cancer patients. This paper aim at finding the smallest set of genes that can ensure highly accurate classification of cancer from micro array data by using supervised ...
متن کاملNoise-Based Feature Perturbation as a Selection Method for Microarray Data
DNA microarrays can monitor the expression levels of thousands of genes simultaneously, providing the opportunity for the identification of genes that are differentially expressed across different conditions. Microarray datasets are generally limited to a small number of samples with a large number of gene expressions, therefore feature selection becomes a very important aspect of the microarra...
متن کامل