Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data
نویسندگان
چکیده
Feature construction is a pre-processing technique to create new features with better discriminating ability from the original features. Genetic programming (GP) has been shown to be a prominent technique for this task. However, applying GP to high-dimensional data is still challenging due to the large search space. Feature clustering groups similar features into clusters, which can be used for dimensionality reduction by choosing representative features from each cluster to form the feature subset. Feature clustering has been shown promising in feature selection; but has not been investigated in feature construction for classification. This paper presents the first work of utilising feature clustering in this area. We propose a cluster-based GP feature construction method called CGPFC which uses feature clustering to improve the performance of GP for feature construction on high-dimensional data. Results on eight high-dimensional datasets with varying difficulties show that the CGPFC constructed features perform better than the original full feature set and features constructed by the standard GP constructor based on the whole feature set.
منابع مشابه
Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملSteel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps
Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...
متن کاملGenetic programming for feature construction and selection in classification on high-dimensional data
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be u...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017