Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

نویسندگان

  • Ying Yang
  • Xindong Wu
  • Xingquan Zhu
چکیده

Attribute noise can affect classification learning. Previous work in handling attribute noise has focused on those predictable attributes that can be predicted by the class and other attributes. However, attributes can often be predictive but unpredictable. Being predictive, they are essential to classification learning and it is important to handle their noise. Being unpredictable, they require strategies different from those of predictable attributes. This paper presents a study on identifying, cleansing and measuring noise for predictive-but-unpredictable attributes. New strategies are accordingly proposed. Both theoretical analysis and empirical evidence suggest that these strategies are more effective and more efficient than previous alternatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Classification with Pedigree and its Applicability to Record Linkage

Real-world data is virtually never noise-free. Current methods for handling noise do so either by removing noisy instances or by trying to clean noisy attributes. Neither of these deal directly with the issue of noise and in fact removing a noisy instance is not a viable option in many real systems. In this paper, we consider the problem of noise in the context of record linkage, a frequent pro...

متن کامل

Estimation of Source Location Using Curvature Analysis

A quadratic surface can be fitted to potential-field data within 3×3 windows, which allow us to calculate curvature attributes from its coefficients. Phillips (2007) derived an equation depending on the most negative curvature to obtain the depth and structural index of isolated sources from peak values of special functions. They divided the special functions into two categories: Model-specific...

متن کامل

An Adaptive Weighted Fuzzy Controller Applied on Quality of Service of Intelligent 5G Environments

in computational intelligence area, it is suitable to fulfill the analysis in order to interpret the concept and sources of uncertainty and the conditions of its incidence, and hence pursuit for reliable techniques of dealing with it. Dealing with uncertainties in this case is a challenging and multidisciplinary activity. So, there is a need for a capable tool for modeling, control, and analyti...

متن کامل

Correlation-based Feature Selection using Ant Colony Optimization

Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004