Application of committee kNN classifiers for gene expression profile classification

نویسندگان

  • Manik Dhawan
  • Sudarshan Selvaraja
  • Zhong-Hui Duan
چکیده

In this study, we develop a two-class classification system based on a committee of k-Nearest Neighbour (kNN) classifiers. The system includes a sequence of simple data preprocessing steps. Each committee consists of 5 kNN classifiers of different architectures. Each classifier on the committee takes in a different set of features. The classification system is then applied to a set of microarray gene expression profiles from leukaemia patients. We show that the system can be effectively used for classifying microarray gene expression data. The results demonstrate the committee approach consistently outperforms individual kNN classifiers in terms of both classification accuracy and stability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules

MOTIVATION Microarrays datasets frequently contain a large number of missing values (MVs), which need to be estimated and replaced for subsequent data mining. The focus of the paper is to study the effects of different MV treatments for cDNA microarray data on disease classification analysis. RESULTS By analyzing five datasets, we demonstrate that among three kinds of classifiers evaluated in...

متن کامل

An eco-informatics tool for microbial community studies: supervised classification of Amplicon Length Heterogeneity (ALH) profiles of 16S rRNA.

Support vector machines (SVM) and K-nearest neighbors (KNN) are two computational machine learning tools that perform supervised classification. This paper presents a novel application of such supervised analytical tools for microbial community profiling and to distinguish patterning among ecosystems. Amplicon length heterogeneity (ALH) profiles from several hypervariable regions of 16S rRNA ge...

متن کامل

Multiclass Boosting with Adaptive Group-Based kNN and Its Application in Text Categorization

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine SVM , neural networks NN , naı̈ve Bayes, and k-nearest neighbor kNN . This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple tw...

متن کامل

Hybridized KNN and SVM for gene expression data classification

Support vector machine (SVM) is one of the most powerful supervised learning algorithms in gene expression analysis. The samples intermixed in another class or in the overlapped boundary region may cause the decision boundary too complex and may be harmful to improve the precise of SVM. In the present paper, hybridized k-nearest neighbor (KNN) classifiers and SVM (HKNNSVM) is proposed to deal w...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International journal of bioinformatics research and applications

دوره 6 4  شماره 

صفحات  -

تاریخ انتشار 2010