Important Features Detection in Continuous Data
نویسندگان
چکیده
In this paper, a method for calculating the importance factor of continuous features from a given set of patterns is presented. A real problem in many practical cases, like medical data, is to find which parts of patterns are crucial for correct classification. This leads to the need of preprocessing all data, which has influence on both time and accuracy of applied methods (when unimportant data hide those which are important). There are some methods that allow selection of important features for binary and sometimes discrete data or, after some preprocessing, continuous data. Very often however, such conversion is burdened with the risk of losing important data, which is a result of lack of knowledge of optimal discretization consequence. Proposed method allows to avoid that problem, because it is based on original, non-transformed continuous data. Two factors concentration and diversity are defined and are used to calculate the importance factor for each feature and pattern. Based on those factors e.g. unimportant features can be identified to decrease dimension of input data or ''bad'' patterns can be detected to improve classification. An example how proposed method can be used to improve decision tree is given as well. Keywords-important features extraction; continuous data analysis; decision tree.
منابع مشابه
A suitable data model for HIV infection and epidemic detection
Background: In recent years, there has been an increase in the amount and variety of data generated in the field of healthcare, (e.g., data related to the prevalence of contagious diseases in the society). Various patterns of individuals’ relationships in the society make the analysis of the network a complex, highly important process in detecting and preventing the incidence of diseases....
متن کاملFault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods
Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...
متن کاملIntegration of Visible Image and LIDAR Altimetric Data for Semi-Automatic Detection and Measuring the Boundari of Features
This paper presents a new method for detecting the features using LiDAR data and visible images. The proposed features detection algorithm has the lowest dependency on region and the type of sensor used for imaging, and about any input LiDAR and image data, including visible bands (red, green and blue) with high spatial resolution, identify features with acceptable accuracy. In the proposed app...
متن کاملMEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection
Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...
متن کاملTarget Detection and Recognition Using Two-dimensional Isotropic and Anisotropic Wavelets
Automatic target detection and recognition (ATR) requires the ability to optimally extract the essential features of an object from (usually) cluttered environments. In this regard, eecient data representation domains are required in which the important target features are both compactly and clearly represented, enhancing ATR. Since both detection and identiication are important, multidimension...
متن کامل