Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection
نویسندگان
چکیده
The nearest-neighbour (1NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. A vital consideration in obtaining good results with this technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In recent years there has been an increasing interest in creating ensembles of classifiers in order to improve classification accuracy. This paper proposes a new ensemble technique which combines multiple 1NN classifiers, each using a different distance function, and potentially a different set of features (feature vector). These feature vectors are determined for each distance metric simultaneously using Tabu Search to minimise the ensemble error rate. We show that this approach implicitly selects for a diverse set of classifiers, and by doing so achieves greater performance improvements than can be achieved by treating the classifies independently, or using a single feature set. Naturally, optimising a the level of ensembles necessitates a much larger solution space, to make this approach tractable, we show how Tabu Search at the ensemble level can be hybridised with local search at the level of individual classifiers. The proposed ensemble classifier with different distance metrics and different feature vectors is evaluated using various benchmark data sets from UCI Machine Learning Repository and a real-world machine-vision application. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without Feature Selection (FS)).
منابع مشابه
Ensembles of Nearest Neighbours for Cancer Classification Using Gene Expression Data
It is known that an ensemble of classifiers can outperform a single best classifier if classifiers in the ensemble are sufficiently diverse (i.e., their errors are as much uncorrelated as possible) and accurate. We study ensembles of nearest neighbours for cancer classification based on gene expression data. Such ensembles have been rarely used, because the traditional ensemble methods such as ...
متن کاملEnsembles of nearest neighbour classifiers and serial analysis of gene expression
In this paper, we represent experimental results obtained with ensembles of nearest neighbour classifiers on the binary classification problem of cancer classification using serial analysis of gene expression (SAGE) data. Nearest neighbours are selected as classifiers since they were rarely employed in building ensembles because their predictions are stable to small perturbations of data, which...
متن کاملScatter Search for the Feature Selection Problem
The feature selection problem in the field of classification consists of obtaining a subset of variables to optimally realize the task without taking into account the remainder variables. This work presents how the search for this subset is performed using the Scatter Search metaheuristic and is compared with two traditional strategies in the literature: the Forward Sequential Selection (FSS) a...
متن کاملAn Evaluation of Alternative Feature Selection Strategies and Ensemble Techniques for Classifying Music
Automatic annotation of music files is a key problem in multimedia information retrieval. In this paper we present a solution to this problem that addresses the issues of feature extraction, feature selection and design of classifier. We outline a process for feature extraction based on the discrete wavelet packet transform and we evaluate a variety of wrapper-based feature subset selection str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition Letters
دوره 31 شماره
صفحات -
تاریخ انتشار 2010