Improving Incremental Wrapper-Based Subset Selection via Replacement and Early Stopping
نویسندگان
چکیده
This paper deals with the problem of feature subset selection in classification-oriented datasets with a (very) large number of attributes. In such datasets complex classical wrapper approaches become intractable due to the high number of wrapper evaluations to be carried out. One way to alleviate this problem is to use the so-called filter-wrapper approach or Incremental Wrapper-based Subset Selection (IWSS), which consists of the construction of a ranking among the predictive attributes by using a filter measure, and then a wrapper approach is used by following the rank. In this way the number of wrapper evaluations is linear on the number of predictive attributes. In this paper we present two contributions to the IWSS approach. The first one is related with obtaining more compact subsets, and enables not only the addition of new attributes but also their interchange with some of those already included in the selected subset. Our second contribution, termed early stopping, sets an adaptive threshold on the number of attributes in the ranking to be considered. The advantages of these new approaches are analyzed both theoretically and experimentally. The results over a set of 12 high-dimensional datasets corroborate the success of our proposals.
منابع مشابه
Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets
In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملFast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking
This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1)...
متن کاملUsing Early Stopping to Reduce Overfitting in Wrapper-Based Feature Weighting
It is acknowledged that overfitting can occur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is related to the intensity of the search algorithm used during this process. We demonstrate that the problem of overfitting in feature weighting can be exacerbated if the feature weighting ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJPRAI
دوره 25 شماره
صفحات -
تاریخ انتشار 2011