A stability index for feature selection
نویسنده
چکیده
Sequential forward selection (SFS) is one of the most widely used feature selection procedures. It starts with an empty set and adds one feature at each step. The estimate of the quality of the candidate subsets usually depends on the training/testing split of the data. Therefore different sequences of features may be returned from repeated runs of SFS. A substantial discrepancy between such sequences will signal a problem with the selection. A stability index is proposed here based on cardinality of the intersection and a correction for chance. The experimental results with 10 real data sets indicate that the index can be useful for selecting the final feature subset. If stability is high, then we should return a subset of features based on their total rank across the SFS runs. If stability is low, then it is better to return the feature subset which gave the minimum error across all SFS runs.
منابع مشابه
Neuro-Fuzzy Based Algorithm for Online Dynamic Voltage Stability Status Prediction Using Wide-Area Phasor Measurements
In this paper, a novel neuro-fuzzy based method combined with a feature selection technique is proposed for online dynamic voltage stability status prediction of power system. This technique uses synchronized phasors measured by phasor measurement units (PMUs) in a wide-area measurement system. In order to minimize the number of neuro-fuzzy inputs, training time and complication of neuro-fuzzy ...
متن کاملEvaluation of Mutual information versus Gini index for stable feature selection
The selection of highly discriminatory features has been crucial in aiding further advancements in domains such as biomedical sciences, high-energy physics and e-commerce. Therefore evaluation of the robustness of feature selection methods to small perturbations in the data, known as feature selection stability, is of great importance to people in these respective fields. However, little resear...
متن کاملStability of Three Forms of Feature Selection Methods on Software Engineering Data
One of the major challenges when working with software metrics datasets is that some metrics may be redundant or irrelevant to software defect prediction. This may be addressed using feature (metric) selection, which chooses an appropriate subset of features for use in downstream computation. There are three major forms of feature selection: filter-based feature rankers, which uses statistical ...
متن کاملFeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کامل