A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

نویسندگان

  • Michael Kearns
  • Yishay Mansour
چکیده

In this work, we present a new bottom-up algorithm for decision tree pruning that is very e cient (requiring only a single pass through the given tree), and prove a strong performance guarantee for the generalization error of the resulting pruned tree. We work in the typical setting in which the given tree T may have been derived from the given training sample S, and thus may badly over t S. In this setting, we give bounds on the amount of additional generalization error that our pruning su ers compared to the optimal pruning of T . More generally, our results show that if there is a pruning of T with small error, and whose size is small compared to jSj, then our algorithm will nd a pruning whose error is not much larger. This style of result has been called an index of resolvability result by Barron and Cover in the context of density estimation. A novel feature of our algorithm is its locality | the decision to prune a subtree is based entirely on properties of that subtree and the sample reaching it. To analyze our algorithm, we develop tools of local uniform convergence, a generalization of the standard notion that may prove useful in other settings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast , Bottom - Up Decision

In this work, we present a new bottom-up algorithm for decision tree pruning that is very eecient (requiring only a single pass through the given tree), and prove a strong performance guarantee for the generalization error of the resulting pruned tree. We work in the typical setting in which the given tree T may have been derived from the given training sample S, and thus may badly overrt S. In...

متن کامل

Subtree Replacement in Decision Tree Simplification

The current availability of efficient algorithms for decision tree induction makes intricate post-processing techniques worth to be investigated both for efficiency and effectiveness. We study the simplification operator of subtree replacement, also known as grafting, originally implemented in the C4.5 system. We present a parametric bottom-up algorithm integrating grafting with the standard pr...

متن کامل

The Biases of Decision Tree Pruning Strategies

Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm's performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypoth...

متن کامل

Mixed Decision Trees: An Evolutionary Approach

In the paper, a new evolutionary algorithm (EA) for mixed tree learning is proposed. In non-terminal nodes of a mixed decision tree different types of tests can be placed, ranging from a typical univariate inequality test up to a multivariate test based on a splitting hyperplane. In contrast to classical top-down methods, our system searches for an optimal tree in a global manner, i.e. it learn...

متن کامل

The Di culty of Reduced Error Pruning ofLeveled Branching

Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more eeciently learnable than decision trees. In experiments this advantage has not been seen to materialize. Decision trees are easy to simplify using pruning. For branching programs no such a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998