Forward Semi-supervised Feature Selection
نویسندگان
چکیده
Traditionally, feature selection methods work directly on labeled examples. However, the availability of labeled examples cannot be taken for granted for many real world applications, such as medical diagnosis, forensic science, fraud detection, etc, where labeled examples are hard to find. This practical problem calls the need for “semi-supervised feature selection” to choose the optimal set of features given both labeled and unlabeled examples that return the most accurate classifier for a learning algorithm. In this paper, we introduce a “wrapper-type” forward semi-supervised feature selection framework. In essence, it uses unlabeled examples to extend the initial labeled training set. Extensive experiments on publicly available datasets shows that our proposed framework, generally, outperforms both traditional supervised and stateof-the-art “filter-type” semi-supervised feature selection algorithms [5] by 1% to 10% in accuracy.
منابع مشابه
Cluster homogeneity as a semi-supervised principle for feature selection using mutual information
In this work the principle of homogeneity between labels and data clusters is exploited in order to develop a semi-supervised Feature Selection method. This principle permits the use of cluster information to improve the estimation of feature relevance in order to increase selection performance. Mutual Information is used in a Forward-Backward search process in order to evaluate the relevance o...
متن کاملGraph Laplacian for Semi-supervised Feature Selection in Regression Problems
Feature selection is fundamental in many data mining or machine learning applications. Most of the algorithms proposed for this task make the assumption that the data are either supervised or unsupervised, while in practice supervised and unsupervised samples are often simultaneously available. Semi-supervised feature selection is thus needed, and has been studied quite intensively these past f...
متن کاملA Convex Formulation for Semi-Supervised Multi-Label Feature Selection
Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semi-supervised feature selection algorithms have been proposed to exploit both labeled and unlabeled data. However, they are implemented based on graphs, such that th...
متن کاملSemi-supervised Feature Selection via Spectral Analysis
Feature selection is an important task in effective data mining. A new challenge to feature selection is the socalled “small labeled-sample problem” in which labeled data is small and unlabeled data is large. The paucity of labeled instances provides insufficient information about the structure of the target concept, and can cause supervised feature selection algorithms to fail. Unsupervised fe...
متن کاملSemi-Supervised Feature Selection with Constraint Sets
In machine learning classification and recognition are crucial tasks. Any object is recognized with the help of features associated with it. Among many features only some leads to classify object correctly. Feature selection is useful technique to detect such specific features. Feature selection is a process of selecting subset of features to reduce number of features (dimensionality reduction)...
متن کامل