Automatically Redundant Features Removal for Unsupervised Feature Selection via Sparse Feature Graph
نویسندگان
چکیده
The redundant features existing in high dimensional datasets always affect the performance of learning and mining algorithms. How to detect and remove them is an important research topic in machine learning and data mining research. In this paper, we propose a graph based approach to find and remove those redundant features automatically for high dimensional data. Based on the framework of sparse learning based unsupervised feature selection, Sparse Feature Graph (SFG) is introduced not only to model the redundancy between two features, but also to disclose the group redundancy between two groups of features. With SFG, we can divide the whole features into different groups, and improve the intrinsic structure of data by removing detected redundant features. With accurate data structure, quality indicator vectors can be obtained to improve the learning performance of existing unsupervised feature selection algorithms such as multi-cluster feature selection (MCFS). Our experimental results on benchmark datasets show that the proposed SFG and feature redundancy remove algorithm can improve the performance of unsupervised feature selection algorithms consistently.
منابع مشابه
Nonparametric Redundant Features Removal for Unsupervised Feature Selection: An Empirical Study based on Sparse Feature Graph
It is known that the redundant features existing in high dimensional datasets will affect the performance of learning and mining algorithms. How to detect and remove them is a challenge in machine learning and data mining research. In this work, we propose a graph based approach to search and remove those redundant features automatically for high dimensional data. A sparse graph is generated at...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملGraph Autoencoder-Based Unsupervised Feature Selection with Broad and Local Data Structure Preservation
Feature selection is a dimensionality reduction technique that selects a subset of representative features from highdimensional data by eliminating irrelevant and redundant features. Recently, feature selection combined with sparse learning has attracted significant attention due to its outstanding performance compared with traditional feature selection methods that ignores correlation between ...
متن کاملRemoving redundant features via clustering: preliminary results in mental task separation
Recent clustering algorithms have been designed to take into account the degree of relevance of each feature, by automatically calculating their weights. However, as the tendency is to evaluate each feature at a time, these algorithms may have difficulties dealing with features containing similar information. Should this information be relevant, these algorithms would set high weights to all su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.04804 شماره
صفحات -
تاریخ انتشار 2017