Decision Trees: More Theoretical Justification
نویسندگان
چکیده
We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as unate functions, linear threshold functions and readonce DNF. For unate functions we show that that maximal purity gain and maximal influence are logically equivalent. This leads us to the exact identification of unate functions by impurity-based algorithms given sufficiently many noise-free examples. We show that for such class of functions these algorithms build minimal height decision trees. Then we show that if the unate function is a read-once DNF or a linear threshold functions then the decision tree resulting from these algorithms has the minimal number of nodes amongst all decision trees representing the function. Based on the statistical query learning model, we introduce the noisetolerant version of practical decision tree algorithms. We show that when the input examples have small classification noise and are uniformly distributed, then all our results for practical noise-free impurity-based algorithms also hold for their noise-tolerant version.
منابع مشابه
TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Decision Trees: More Theoretical Justification for Practical Algorithms
We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as a boolean linear threshold functions and a read-once DNF. We show that for boolean linear threshold function...
متن کاملIntuition and the junctures of judgment in decision procedures for clinical ethics.
Moral decision procedures such as principlism or casuistry require intuition at certain junctures, as when a principle seems indeterminate, or principles conflict, or we wonder which paradigm case is most relevantly similar to the instant case. However, intuitions are widely thought to lack epistemic justification, and many ethicists urge that such decision procedures dispense with intuition in...
متن کاملDecision Trees: More Theoretical Justification for Practical Algorithms
We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as a boolean linear threshold functions or a read-once DNF. We show that for boolean linear threshold functions...
متن کاملRule Extraction from Ensemble Methods Using Aggregated Decision Trees
Ensemble methods have become very well known for being powerful pattern recognition algorithms capable of achieving high accuracy. However, Ensemble methods produces learners that are not comprehensible or transferable thus making them unsuitable for tasks that require a rational justification for making a decision. Rule Extraction methods can resolve this limitation by extracting comprehensibl...
متن کاملBoosting with Multi-Way Branching in Decision Trees
It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a trade-off between the number of branches and the improvement in tree quality as measur...
متن کامل