on Mathematical and Computing Sciences : TR - C 139 title : Experimental evaluation of an adaptive boosting by filtering algorithm
نویسنده
چکیده
In this paper we present an empirical comparison of algorithm AdaBoost with its modification called MadaBoost suitable for the boosting by filtering framework. In the boosting by filtering one obtains an unweighted sample at each stage that is randomly drawn from the current modified distribution in contrast with the boosting by subsampling where one uses a weighted sample at each stage. A boosting algorithm for the filtering framework might be very useful in situations where we have a very large dataset and we want to reduce the running time of the base learner using random sampling. AdaBoost was originally designed for boosting by subsampling and it can be shown that is not suitable for boosting by filtering since generating a new sample at each iteration from the modified distribution might take exponential time. The modification we propose is very simple, we just bound the weights at each step using its initial value as a saturation bound so they weights cannot become arbitrarily large as it happens in AdaBoost. We experimentally show that there is not significant difference in accuracy compared with AdaBoost while it is much more efficient in terms of running time in generating a sample from the modified distribution at each step. This difference might be crucial when trying to use boosting with a large dataset.
منابع مشابه
on Mathematical and Computing Sciences : TR - C 138 title : MadaBoost : A modification of
In the last decade, one of the research topics that has received a great deal of attention from the machine learning and computational learning communities has been the so called boosting techniques. In this paper, we further explore this topic by proposing a new boosting algorithm that mends some of the problems that have been detected in the, so far most successful boosting algorithm, AdaBoos...
متن کاملTechnical Reports on Mathematical and Computing Sciences: TR-CXXX title: Algorithmic Aspects of Boosting
We discuss algorithmic aspects of boosting techniques, such as Majority Vote Boosting [Fre95], AdaBoost [FS97], and MadaBoost [DW00a]. Considering a situation where we are given a huge amount of examples and asked to find some rule for explaining these example data, we show some reasonable algorithmic approaches for dealing with such a huge dataset by boosting techniques. Through this example, ...
متن کاملAdaptive-Filtering-Based Algorithm for Impulsive Noise Cancellation from ECG Signal
Suppression of noise and artifacts is a necessary step in biomedical data processing. Adaptive filtering is known as useful method to overcome this problem. Among various contaminants, there are some situations such as electrical activities of muscles contribute to impulsive noise. This paper deals with modeling real-life muscle noise with α-stable probability distribution and adaptive filterin...
متن کاملTechnical Reports on Mathematical and Computing Sciences: Tr-c136 Title: Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms (extended Revised Version of Tr-c131)
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several ...
متن کاملTechnical Reports on Mathematical and Computing Sciences: Tr-c133 Title: a Modiication of Adaboost: a Preliminary Report Author
In this note, we discuss the boosting algorithm AdaBoost and identify two of its main drawbacks: it cannot be used in the boosting by ltering framework and it is not noise resistant. In order to solve them, we propose a modiication of the weighting system of AdaBoost. We prove that the new algorithm is in fact a boosting algorithm under the condition that the sequence of advantages generated by...
متن کامل