Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

نویسنده

  • Jing Wang
چکیده

Classification performance can degrade if data contain missing attribute values. Many methods deal with missing information in a simple way, such as replacing missing values with the global or class-conditional mean/mode. We propose a new iterative algorithm to effectively estimate missing attribute values in both training data and test data. The attributes are selected one by one to be completed. For each attribute, the unknown values are predicted using a decision tree built using the other attributes from cases with known values of the attribute. The training set filled in this way is used to classify a tuning set whose prediction error rate decides which attribute is selected to be filled in the current iteration. Prediction error rate of the tuning set is recorded at each iteration to determine an optimal stopping point, as filling all missing values may lead to overfitting. The experiments show that the method generally outperforms several reasonable baseline methods and the ordered attribute trees method proposed by Lobo and Numao.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Une approche probabiliste pour le classement d'objets incomplètement connus dans un arbre de décision. (A Probabilistic Approach to Classify Incomplete Objects in a Decision Tree)

We describe in this thesis an approach to fill missing values in decision trees during theclassification phase. This approach is derived from the ordered attribute trees (OAT) method,proposed by Lobo and Numao in 2000, which builds a decision tree for each attribute and usesthese trees to fill the missing attribute values. It is based on the Mutual Information between theattribu...

متن کامل

Missing Value Estimation Based on Dynamic Attribute Selection

Raw Data used in data mining often contain missing information, which inevitably degrades the quality of the derived knowledge. In this paper, a new method of guessing missing attribute values is suggested. This method selects attributes one by one using attribute group mutual information calculated by flattening the already selected attributes. As each new attribute is added, its missing value...

متن کامل

Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data

We consider the problem of learning to classify partially specified instances i.e., instances that are described in terms of attribute values at different levels of precision, using user-supplied attribute value taxonomies (AVT). We formalize the problem of learning from AVT and data and present an AVT-guided decision tree learning algorithm (AVT-DTL) to learn classification rules at multiple l...

متن کامل

Triangular Intuitionistic Fuzzy Triple Bonferroni Harmonic Mean Operators and Application to Multi-attribute Group Decision Making

As an special intuitionistic fuzzy set defined on the real number set, triangular intuitionistic fuzzy number (TIFN) is a fundamental tool for quantifying an ill-known quantity. In order to model the decision maker's overall preference with mandatory requirements, it is necessary to develop some Bonferroni harmonic mean operators for TIFNs which can be used to effectively intergrate the informa...

متن کامل

A Comparative Study on Decision Rule Induction for incomplete data using Rough Set and Random Tree Approaches

Handling missing attribute values is the greatest challenging process in data analysis. There are so many approaches that can be adopted to handle the missing attributes. In this paper, a comparative analysis is made of an incomplete dataset for future prediction using rough set approach and random tree generation in data mining. The result of simple classification technique (using random tree ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009