Machine Learning Predicting nearly as well as the best pruning of a decision tree

نویسنده

  • DAVID P HELMBOLD
چکیده

Many algorithms for inferring a decision tree from data involve a two phase process First a very large decision tree is grown which typically ends up over tting the data To reduce over tting in the second phase the tree is pruned using one of a number of available methods The nal tree is then output and used for classi cation on test data In this paper we suggest an alternative approach to the pruning phase Using a given unpruned decision tree we present a new method of making predictions on test data and we prove that our algorithm s performance will not be much worse in a precise technical sense than the predic tions made by the best reasonably small pruning of the given decision tree Thus our procedure is guaranteed to be competitive in terms of the quality of its predictions with any pruning al gorithm We prove that our procedure is very e cient and highly robust Our method can be viewed as a synthesis of two previously studied techniques First we apply Cesa Bianchi et al s results on predicting using expert advice where we view each pruning as an expert to obtain an algorithm that has provably low prediction loss but that is com putationally infeasible Next we generalize and apply a method developed by Buntine and Willems Shtarkov and Tjalkens to derive a very e cient implementation of this procedure

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...

متن کامل

Predicting Nearly as well as the best Pruning of a Planar Decision Graph

We design efficient on-line algorithms that predict nearly as well as the best pruning of a planar decision graph. We assume that the graph has no cycles. As in the previous work on decision trees, we implicitly maintain one weight for each of the prunings (exponentially many). The method works for a large class of algorithms that update its weights multiplicatively. It can also be used to desi...

متن کامل

بررسی کارایی مدل درختان تصمیم‌گیری در برآورد رسوبات معلق رودخانه‌ای (مطالعه موردی: حوضه سد ایلام)

The real estimation of the volume of sediments carried by rivers in water projects is very important. In fact, achieving the most important ways to calculate sediment discharge has been considered as the objective of the most research projects. Among these methods, the machine learning methods such as decision trees model (that are based on the principles of learning) can be presented. Decision...

متن کامل

Exploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach

Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...

متن کامل

Comparative Analysis of Machine Learning Algorithms with Optimization Purposes

The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches‎. ‎Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data‎. ‎In this paper‎, ‎a methodology has been employed to opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006