Toward a Theoretical Understanding of Why and WhenDecision Tree Pruning Algorithms

ثبت نشده
چکیده

Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy. Second, building trees with data in which the class label and the attributes are independent often results in large trees. In both cases, the pruning algorithms fail to control tree growth as one would expect them to. We explore this behavior theoretically by constructing a statistical model of reduced error pruning. The model explains why and when the pathologies occur, and makes predictions about how to lessen their eeects. The predictions are operationalized in a variant of reduced error pruning that is shown to control tree growth far better than the original algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail Keywords: decision-tree learning, theory of model selection and evaluation

Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy. Second, building trees with data in which the class label and the attributes are independent often results in large trees. In both cases, the pruning ...

متن کامل

Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail

Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy. Second, building trees with data in which the class label and the attributes are independent often results in large trees. In both cases, the pruning ...

متن کامل

Experiments with an innovative tree pruning algorithm

The pruning phase is one of the necessary steps in decision tree induction. Existing pruning algorithms tend to have some or all of the following difficulties: 1) lack of theoretical support; 2) high computational complexity; 3) dependence on validation; 4) complicated implementation. The 2-norm pruning algorithm proposed here addresses all of the above difficulties. This paper demonstrates the...

متن کامل

Induction of Modular Classification Rules: Using Jmax-pruning

The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and nois...

متن کامل

Pruning Decision Trees and Lists

Machine learning algorithms are techniques that automatically build models describing the structure at the heart of a set of data. Ideally, such models can be used to predict properties of future data points and people can use them to analyze the domain from which the data originates. Decision trees and lists are potentially powerful predictors and embody an explicit representation of the struc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999