Learning Characteristic Decision Trees
نویسنده
چکیده
Decision trees constructed by ID3-like algorithms suffer from an inability of detecting instances of categories not present in the set of training examples, i.e., they are discriminative representations. Instead, such instances are assigned to one of the classes actually present in the training set, resulting in undesired misclassifications. Two methods of reducing this problem by learning characteristic representations are presented. The central idea behind both methods is to augment each leaf of the decision tree with a subtree containing additional information concerning each feature’s values in that leaf. This is done by computing two limits (lower and upper) for every feature from the training instances belonging to the leaf. A subtree is then constructed from these limits that tests every feature; if the value is below the lower limit or above the upper limit for some feature, the instance will be rejected, i.e., regarded as belonging to a novel class. This subtree is then appended to the leaf. The first method presented corresponds to creating a maximum specific description, whereas the second is a novel method that makes use of the information about the statistical distribution of the feature values that can be extracted from the training examples. An important property of the novel method is that the degree of generalization can be controlled. The methods are evaluated empirically in two different domains, the Iris classification problem and a novel coin classification problem. It is concluded that the dynamical properties of the second method makes it preferable in most applications. Finally, we argue that this method is very general in that it, in principle, can be applied to any empirical learning algorithm.
منابع مشابه
Automated Sleep Stage Scoring by Decision Tree Learning
In this paper we describe a waveform recognition method that extracts characteristic parameters from waveforms and a method of automated sleep stage scoring using decision tree learning that is in practice regarded as one of the most successful machine learning methods. In our method, rst characteristics of EEG, EOG and EMG are compared with characteristic features of alpha waves, delta waves, ...
متن کاملPredicting Learners’ Performance in an E-Learning Platform Based on Decision Tree Analysis
The ability to predict learners' performance on an e-learning platform is a decisive factor in the current educational systems. Indeed, learning through decision trees uses more sophisticated and efficient algorithms based on the use of predictive models. A decision tree is a decision support tool for assessing the value of a characteristic of a population based on the observation of other char...
متن کاملAn Improved Medical Diagnosing of Acute Abdominal Pain with Decision Tree
In medical decision making (e.g., classification) we expect that decision will be made effectively and reliably. Decision making systems with their ability to learn automatically seem to be very appropriate for performing such tasks. Decision trees provide high classification accuracy with simple representation of gathered knowledge. Those advantages cause that decision trees have been widely u...
متن کاملSpeech intention understanding based on decision tree learning
This paper proposes a method of speech intention understanding based on a spoken dialogue corpus to which the intention tags are given. The intention tag expresses the task-dependent intention of the speaker, and therefore, the proper understanding enables a spoken dialogue system to take appropriate actions. We have tagged about 35000 utterances in the CIAIR incar speech database. In our metho...
متن کاملStochastic Attribute Selection Committees withMultiple Boosting : Learning More
Classiier learning is a key technique for KDD. Approaches to learning classiier committees, including Boosting, Bagging, Sasc, and SascB, have demonstrated great success in increasing the prediction accuracy of decision trees. Boosting and Bagging create diierent classiiers by modifying the distribution of the training set. Sasc adopts a diierent method. It generates committees by stochastic ma...
متن کاملUsing Decision Trees and Support Vector Machines to Classify Genes by Names
In this paper we report an application of machine learning methods to classify gene names into two categories: known and unknown ones. We acquired a data set of 1,624 genes by letting a human expert classify them manually. To capture the knowledge of classification, we also asked the expert to derive a set of rules. In parallel, we trained two machine learners to capture the same knowledge. Bot...
متن کامل