Pruning Irrelevant Features from Oblivious Decision Trees
نویسنده
چکیده
In this paper, we examine an approach to feature selection designed to handle domains that involve both irrelevant and interacting features. We review the reasons this situation poses challenges to both nearest neighbor and decision-tree methods, then describe a new algorithm OBLIVION that carries out greedy pruning of oblivious decision trees. We summarize the results of experiments with artificial domains, which show that OBLIVION’S sample complexity grows slowly with the number of irrelevant features, and with natural domains, which suggest that few existing data sets contain many irrelevant features. In closing, we consider other work on feature selection and outline directions for future research. 1. Nature of the Problem One of the central problems in machine induction involves discriminating between features that are relevant to the target concept and ones that are irrelevant. Presumably, many real-world learning tasks contain large numbers of irrelevant terms, and for such tasks, one would prefer to use algorithms that scale well along this dimension. More specifically, one would like the number of training instances needed to reach a given level of accuracy (the sample complexity) to grow slowly with increasing numbers of irrelevant features. We define relevance in the context of such an induction task. Given a set of classified training instances for some target concept, the goal is to improve classification accuracy on a set of novel test instances. One way to improve accuracy involves identifying the features relevant to the target concept. Following John, Kohavi, and Pfleger (1994), we say that a feature relevant if it belongs to some subset of the known features that is minimally sufficient to correctly classify instances. John et al. break their definition down further into notions of strong and weak relevance, but in this paper we will not find it necessary to distinguish the two senses. Some previous experimental studies have examined the effect of irrelevant features on learning. For example, Aha (1990) reports experiments with a simple Boolean target concept which suggest that the sample complexity for the simple nearest neighbor method is exponential in the number of irrelevant features. Techniques for inducing decision trees, such as Quinlan’s (1993) C4.5, do much better on conjunctive and similar target concepts because they attempt to select relevant features and eliminate irrelevant ones. However, such methods typically carry out a greedy search through the space of decision trees. This approach works well in domains where there is little interaction among the relevant attributes, as in conjunctive concepts, but the presence of attribute interactions, such as occurs in parity concepts, can cause significant problems for this scheme. Experimental studies by Almuallim and Dietterich (1991) and by Kira and Rendell (1992) that, for some target concepts, methods for decisiontree induction also deal poorly with irrelevant features. In response to this problem, Almuallim and Dietterich (1990) developed Focus, an algorithm which directly searches for minimal combinations of attributes that perfectly discriminate among the classes. This method begins by looking at each feature in isolation, then turns to pairs of features, triples, and so forth, halting as soon as it finds a combination that generates pure partitions of the training set (i.e., in which no instances have different classes). Their scheme then passes on the reduced set of features to ID3, which constructs a decision tree from the simplified training data. Comparative studies with ID3 and with Pagallo and Hanssler’s (1990) FRINGE showed that, for a given number of training cases on randomly selected Boolean target concepts, Focus was almost unaffected by the introduction of irrelevant attributes, whereas the accuracy of ID3 and FRINGE degraded significantly. Schlimmer (1993) has described a similar method that also starts with individual attributes and searches the space of attribute combinations, continuing until it finds a partition of the training set that has pure classes. Both of these algorithms address the problem of attribute interaction in the presence of irrelevants by directly examining combinations of features. At least for noise-free data, this approach has the advantage 132 From: AAAI Technical Report FS-94-02. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. of guaranteeing identification of minimal relevant feature sets, in contrast to the greedy approach used by C4.5 and its relatives. However, the price is greatly increased computational cost. Almuallim and Dietterich showed that Focus’ time complexity is quasipolynomial in the number of attributes, which they acknowledged is impractical for target concepts that involve many features. Schlimmer introduced techniques for pruning the search tree without losing completeness, but even with this savings, he had to limit the length of feature combinations considered (and thus the complexity of learnable target concepts) to keep search within bounds. Thus, there remains a need for more practical algorithms that can handle domains with both complex feature interactions and irrelevant
منابع مشابه
Oblivious Decision Trees, Graphs, and Top-Down Pruning
We describe a supervised learning algorithm, EODG that uses mutual information to build an oblivious decision tree The tree is then converted to an Oblivious read-Onre Decision Graph (OODG) b\ merging nodes at the same level of the tree For domains that art appropriate for both decision trees and OODGs, per formance is approximately the same aS THAT of C45 ), but the number of nodes in the OODG...
متن کاملOblivious Decision Trees and Abstract Cases
In this paper, we address the problem of case-based learning in the presence of irrelevant features. We review previous work on attribute selection and present a new algorithm, OBLIVION, that carries out greedy pruning of oblivious decision trees, which effectively store a set of abstract cases in memory. We hypothesize that this approach will efficiently identify relevant features even when th...
متن کاملOblivious Decision Trees , Graphs , and Top - Down
We describe a supervised learning algorithm, EODG, that uses mutual information to build an oblivious decision tree. The tree is then converted to an Oblivious read-Once Decision Graph (OODG) by merging nodes at the same level of the tree. For domains that are appropriate for both decision trees and OODGs, performance is approximately the same as that of C4.5, but the number of nodes in the OOD...
متن کاملof the AAAI - 94 Workshop on Case - Based Reasoning ( 1994 )
In this paper, we address the problem of case-based learning in the presence of irrelevant features. We review previous work on attribute selection and present a new algorithm, Oblivion, that carries out greedy pruning of oblivious decision trees, which eeectively store a set of abstract cases in memory. We hypothesize that this approach will eeciently identify relevant features even when they ...
متن کاملof the AAAI - 94 Workshop on Case - Based Reasoning ( 1994 ) . Seattle , WA : AAAI
In this paper, we address the problem of case-based learning in the presence of irrelevant features. We review previous work on attribute selection and present a new algorithm, Oblivion, that carries out greedy pruning of oblivious decision trees, which eeectively store a set of abstract cases in memory. We hypothesize that this approach will eeciently identify relevant features even when they ...
متن کامل