Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources
نویسندگان
چکیده
Attribute noise can affect classification learning. Previous work in handling attribute noise has focused on those predictable attributes that can be predicted by the class and other attributes. However, attributes can often be predictive but unpredictable. Being predictive, they are essential to classification learning and it is important to handle their noise. Being unpredictable, they require strategies different from those of predictable attributes. This paper presents a study on identifying, cleansing and measuring noise for predictive-but-unpredictable attributes. New strategies are accordingly proposed. Both theoretical analysis and empirical evidence suggest that these strategies are more effective and more efficient than previous alternatives.
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملClassification with Pedigree and its Applicability to Record Linkage
Real-world data is virtually never noise-free. Current methods for handling noise do so either by removing noisy instances or by trying to clean noisy attributes. Neither of these deal directly with the issue of noise and in fact removing a noisy instance is not a viable option in many real systems. In this paper, we consider the problem of noise in the context of record linkage, a frequent pro...
متن کاملEstimation of Source Location Using Curvature Analysis
A quadratic surface can be fitted to potential-field data within 3×3 windows, which allow us to calculate curvature attributes from its coefficients. Phillips (2007) derived an equation depending on the most negative curvature to obtain the depth and structural index of isolated sources from peak values of special functions. They divided the special functions into two categories: Model-specific...
متن کاملAn Adaptive Weighted Fuzzy Controller Applied on Quality of Service of Intelligent 5G Environments
in computational intelligence area, it is suitable to fulfill the analysis in order to interpret the concept and sources of uncertainty and the conditions of its incidence, and hence pursuit for reliable techniques of dealing with it. Dealing with uncertainties in this case is a challenging and multidisciplinary activity. So, there is a need for a capable tool for modeling, control, and analyti...
متن کاملCorrelation-based Feature Selection using Ant Colony Optimization
Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a ...
متن کامل