Bagging Is a Small-Data-Set Phenomenon
نویسندگان
چکیده
Bagging forms a committee of classijiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in betterperformance than bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve use of datasets that are too large to handle in the memory of the typical computer: Our results indicate that, in such applications, the simple approach of creating a committee of classijiers from disjoint partitions is to be preferred over the more complex approach of bagging.
منابع مشابه
Experimental study for the comparison of classifier combination methods
In this paper, we compare the performances of classifier combination methods (bagging, modified random subspace method, classifier selection, parametric fusion) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are: (a) combination function among input variables, (b) correlation between input variables, (c) varianc...
متن کاملImproving the Robustness of Bagging with Reduced Sampling Size
Bagging is a simple and robust classification algorithm in the presence of class label noise. This algorithm builds an ensemble of classifiers by bootstrapping samples with replacement of size equal to the original training set. However, several studies have shown that this choice of sampling size is arbitrary in terms of generalization performance of the ensemble. In this study we discuss how ...
متن کاملBagging KNN Classifiers using Different Expert Fusion Strategies
An experimental evaluation of Bagging K-nearest neighbor classifiers (KNN) is performed. The goal is to investigate whether varying soft methods of aggregation would yield better results than Sum and Vote. We evaluate the performance of Sum, Product, MProduct, Minimum, Maximum, Median and Vote under varying parameters. The results over different training set sizes show minor improvement due to ...
متن کاملA Study Of Bagging And Boosting Approaches To Develop Meta - Classifier
-Classification is one of the data mining techniques that analyses a given data set and induces a model for each class based on their features present in the data. Bagging and boosting are heuristic approaches to develop classification models. These techniques generate a diverse ensemble of classifiers by manipulating the training data given to a base learning algorithm. They are very successfu...
متن کاملBagging Using Statistical Queries
Bagging is an ensemble method that relies on random resampling of a data set to construct models for the ensemble. When only statistics about the data are available, but no individual examples, the straightforward resampling procedure cannot be implemented. The question is then whether bagging can somehow be simulated. In this paper we propose a method that, instead of computing certain heurist...
متن کامل