Bagging, Boosting, and C4.5
نویسنده
چکیده
Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classiier learning systems. Both form a set of classiiers that are combined by voting, bagging by generating replicated boot-strap samples of the data, and boosting by adjusting the weights of training instances. This paper reports results of applying both techniques to a system that learns decision trees and testing on a representative collection of datasets. While both approaches substantially improve predictive accuracy, boosting shows the greater beneet. On the other hand, boosting also produces severe degradation on some datasets. A small change to the way that boosting combines the votes of learned classiiers reduces this downside and also leads to slightly better results on most of the datasets considered.
منابع مشابه
Classifying Unseen Cases with Many
Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classi-ers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosti...
متن کاملoosting, a C4.5
Breiman’s bagging and Freund and Schapire’s boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weights of training instances. This paper reports results of applying both techniques to a system that l...
متن کاملAn Empirical Study of Combined Classifiers for Knowledge Discovery on Medical Data Bases
This paper compares the accuracy of combined classifiers in medical data bases to the same knowledge discovery techniques applied to generic data bases. Specifically, we apply Bagging and Boosting methods for 16 medical and 16 generic data bases and compare the accuracy results with a more traditional approach (C4.5 algorithm). Bagging and Boosting methods are applied using different numbers of...
متن کاملClassifying Unseen Cases with Many Missing Values
Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classiiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robust-ness of four recently developed committee learning techniques, including Boost...
متن کاملStochastic Attribute Selection Committees
Classi er committee learning methods generate multiple classi ers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the nal classication. Two such methods, Bagging and Boosting, have shown great success with decision tree learning. They create di erent classi ers by modifying the distribution of the training set. This paper stu...
متن کامل