Learning Characteristic Decision Trees

نویسنده

  • Paul Davidsson
چکیده

Decision trees constructed by ID3-like algorithms suffer from an inability of detecting instances of categories not present in the set of training examples, i.e., they are discriminative representations. Instead, such instances are assigned to one of the classes actually present in the training set, resulting in undesired misclassifications. Two methods of reducing this problem by learning characteristic representations are presented. The central idea behind both methods is to augment each leaf of the decision tree with a subtree containing additional information concerning each feature’s values in that leaf. This is done by computing two limits (lower and upper) for every feature from the training instances belonging to the leaf. A subtree is then constructed from these limits that tests every feature; if the value is below the lower limit or above the upper limit for some feature, the instance will be rejected, i.e., regarded as belonging to a novel class. This subtree is then appended to the leaf. The first method presented corresponds to creating a maximum specific description, whereas the second is a novel method that makes use of the information about the statistical distribution of the feature values that can be extracted from the training examples. An important property of the novel method is that the degree of generalization can be controlled. The methods are evaluated empirically in two different domains, the Iris classification problem and a novel coin classification problem. It is concluded that the dynamical properties of the second method makes it preferable in most applications. Finally, we argue that this method is very general in that it, in principle, can be applied to any empirical learning algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Sleep Stage Scoring by Decision Tree Learning

In this paper we describe a waveform recognition method that extracts characteristic parameters from waveforms and a method of automated sleep stage scoring using decision tree learning that is in practice regarded as one of the most successful machine learning methods. In our method, rst characteristics of EEG, EOG and EMG are compared with characteristic features of alpha waves, delta waves, ...

متن کامل

Predicting Learners’ Performance in an E-Learning Platform Based on Decision Tree Analysis

The ability to predict learners' performance on an e-learning platform is a decisive factor in the current educational systems. Indeed, learning through decision trees uses more sophisticated and efficient algorithms based on the use of predictive models. A decision tree is a decision support tool for assessing the value of a characteristic of a population based on the observation of other char...

متن کامل

An Improved Medical Diagnosing of Acute Abdominal Pain with Decision Tree

In medical decision making (e.g., classification) we expect that decision will be made effectively and reliably. Decision making systems with their ability to learn automatically seem to be very appropriate for performing such tasks. Decision trees provide high classification accuracy with simple representation of gathered knowledge. Those advantages cause that decision trees have been widely u...

متن کامل

Speech intention understanding based on decision tree learning

This paper proposes a method of speech intention understanding based on a spoken dialogue corpus to which the intention tags are given. The intention tag expresses the task-dependent intention of the speaker, and therefore, the proper understanding enables a spoken dialogue system to take appropriate actions. We have tagged about 35000 utterances in the CIAIR incar speech database. In our metho...

متن کامل

Stochastic Attribute Selection Committees withMultiple Boosting : Learning More

Classiier learning is a key technique for KDD. Approaches to learning classiier committees, including Boosting, Bagging, Sasc, and SascB, have demonstrated great success in increasing the prediction accuracy of decision trees. Boosting and Bagging create diierent classiiers by modifying the distribution of the training set. Sasc adopts a diierent method. It generates committees by stochastic ma...

متن کامل

Using Decision Trees and Support Vector Machines to Classify Genes by Names

In this paper we report an application of machine learning methods to classify gene names into two categories: known and unknown ones. We acquired a data set of 1,624 genes by letting a human expert classify them manually. To capture the knowledge of classification, we also asked the expert to derive a set of rules. In parallel, we trained two machine learners to capture the same knowledge. Bot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994