Subset Selection Using the Wrapper Method : Over tting and Dynamic Search Space
نویسندگان
چکیده
In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a black box. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwise variants can be viewed as simple hill-climbing techniques in the space of feature subsets. We utilize best-rst search to nd a good feature subset and discuss overrtting problems that may be associated with searching too many feature subsets. We introduce compound operators that dynamically change the topology of the search space to better utilize the information available from the evaluation of feature subsets. We show that compound operators unify previous approaches that deal with relevant and irrelevant features. The improved feature subset selection yields signiicant improvements for real-world datasets when using the ID3 and the Naive-Bayes induction algorithms.
منابع مشابه
Appears in the First International Conference on Knowledge Discovery and Data Mining (KDD-95) Feature Subset Selection Using the Wrapper Method: Over tting and Dynamic Search Space Topology
In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a black box. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwise variants can be viewed as simple hill-climbi...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملAcoustic group feature selection using wrapper method for automatic eating condition recognition
In this paper, we present a wrapper-based acoustic group feature selection system for the INTERSPEECH 2015 Computational Paralinguistics Challenge (ComParE) 2015, Eating Condition (EC) Sub-challenge. The wrapper-based method has two components: the feature subset evaluation and the feature space search. The feature subset evaluation is performed using Support Vector Machine (SVM) classifiers. T...
متن کاملFeature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology
In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a black box. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwise variants can be viewed as simple hill-climbi...
متن کامل