Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

نویسندگان

Ali Najafi Molecular Biology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran

Malihe Ram Dept. of Biostatistics, Public Health School, Mashhad University of Medical Sciences, Mashhad, Iran

Mohammad Taghi Shakeri Dept. of Biostatistics, Public Health School, Mashhad University of Medical Sciences, Mashhad, Iran

چکیده مقاله:

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and small-n. Therefore, RF can be used to select and rank the genes for the diagnosis and effective treatment of cancer. Methods: The microarray gene expression data of colon, leukemia, and prostate cancers were collected from public databases. Primary preprocessing was done on them using limma package, and then, the RF classification method was implemented on datasets separately in R software. Finally, the selected genes in each of the cancers were evaluated and compared with those of previous experimental studies and their functionalities were assessed in molecular cancer processes. Result: The RF method extracted very small sets of genes while it retained its predictive performance. About colon cancer data set DIEXF, GUCA2A, CA7, and IGHA1 key genes with the accuracy of 87.39 and precision of 85.45 were selected. The SNCA, USP20, and SNRPA1 genes were selected for prostate cancer with the accuracy of 73.33 and precision of 66.67. Also, key genes of leukemia data set were BAG4, ANKHD1-EIF4EBP3, PLXNC1, and PCDH9 genes, and the accuracy and precision were 100 and 95.24, respectively. Conclusion: The current study results showed most of the selected genes involved in the processes and cancerous pathways were previously reported and had an important role in shifting from normal cell to abnormal.

Download for Free

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random forest for gene selection and microarray data classification

A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as we...

متن کامل

Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...

متن کامل

Feature Selection for Cancer Classification Using Microarray Gene Expression Data

The DNA microarray technology enables us to measure the expression levels of thousands of genes simultaneously, providing great chance for cancer diagnosis and prognosis. The number of genes often exceeds tens of thousands, whereas the number of subjects available is often no more than a hundred. Therefore, it is necessary and important to perform gene selection for classification purpose. A go...

متن کامل

Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues

Random forest is a collection (ensemble) of decision trees. It is a popular ensemble technique in pattern recognition. In this article, we apply random forest for cancer classification based on gene expression and address two issues that have been so far overlooked in other works. First, we demonstrate on two different real-world datasets that the performance of random forest is strongly influe...

متن کامل

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

عنوان ژورنال

Iranian Journal of Pathology

دوره 12 شماره 4

صفحات 339- 347

تاریخ انتشار 2017-12-01

دنبال کردن

لغو دنبال کردن

{@ msg @}

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

Microarray Random Forest Cancer Gene Selection Classification

میزبانی شده توسط پلتفرم ابری doprax.com