Margin-Based Feature Selection in Incomplete Data
نویسندگان
چکیده
This study considers the problem of feature selection in in complete data. The intuitive approach is to first impute the missing values, and then apply a standard feature selection method to select relevant features. In this study, we show how to perform feature selection directly, without imputing missing values. We define the objective function of the un certainty margin based feature selection method to maxim ize each instance’s uncertainty margin in its own relevant subspace. In optimization, we take into account the uncer tainty of each instance due to the missing values. The exper imental results on synthetic and 6 benchmark data sets with few missing values (less than 25%) provide evidence that our method can select the same accurate features as the al ternative methods which apply an imputation method first. However, when there is a large fraction of missing values (more than 25%) in data, our feature selection method out performs the alternatives, which impute missing values first.
منابع مشابه
Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملMargin-based feature selection for hyperspectral data
A margin based feature selection approach is explored for hyperspectral data. This approach is based on measuring the confidence of a classifier when making predictions on a test data. Greedy feature flip and iterative search algorithms, which attempts to maximise the margin based evaluation functions, were used in the present study. Evaluation functions use linear, zero-one and sigmoid utility...
متن کاملIFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF
Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...
متن کاملImproving Chernoff criterion for classification by using the filled function
Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کامل