“Boosting” Stumps from Positive Only Data

نویسنده

  • Andrew R. Mitchell
چکیده

Most current learning algorithms require both positive and negative data. This is also the case for many of the recent ensemble learning techniques. Applications of Boosting, for example, rely on both positive and negative data to produce a hypothesis with high predictive accuracy. In this technical report, a learning methodology is presented that does not rely on negative examples. A learning method in this framework is described which shows remarkable similarities to boosting stumps. This is all the more surprising because learning from positive data has traditionally turned out to be very difficult. Empirical results show that this technique successfully boosts stumps from positive data by paying only a small price in accuracy compared to learners that have access to both positive and negative data. Some theoretical justification of the results is also provided. 1 TRADITIONAL LEARNING MODEL In the traditional learning model, a learner is presented with a finite set of examples with their corresponding class labels. From this the learner is required to produce a decision procedure, which, given an unlabelled (unclassified) example, returns, with a high degree of accuracy, the correct class label [Qui86, FS96]. Decision trees have been very effectively used to represent classifiers, and several systems have been built to produce decision trees from positive and negative data [FS96]. Ensemble methods such as bagging and boosting [Qui96, Bre96] combine the outcome of several different classifiers to form a new (and hopefully improved) classifier. In most cases, the dataset is modified after each classifier is created so as to create a different classifier each time. Boosting [FS96] is one such ensemble technique. In this method, the first classifier is created from the full data set, each example being equally weighted. After the initial classifier has been built, the weights of the examples are modified so that the original classifier performs no better than chance on the new, weighted data set. This process is continued to create a series of classifiers. The classifiers are assigned weights according to their estimated error and trial number and are voted together on any new unclassified data. Given the power of boosting to improve the classification accuracy of a learner, one does not need to begin with a very good learner. Decision Stumps [IL92] are examples of such a weak learner whose accuracy has been shown to improve by application of boosting. A decision stump is a decision tree of depth one; i.e., a stump consists of a single decision node. In boosting, as presented by Freund and Schapire [FS96], there does not appear to be any direct approach to applying boosting to positive only data, as the error of a hypothesis on the training data is not very meaningful in this case. For example, the hypothesis that covers all the data instances will have zero training error, but will most likely be a very poor predictor. In the present technical report, we describe an algorithm that may be considered a modification of boosting applied to decision stumps in such a way that boosting still occurs despite the learner receiving only positive data. 2 PRELIMINARIES OF AN ALTERNATE METHODOLOGY For the purposes of learning from positive-only data, we assume the following model: The learner receives a finite set of positive only examples of some concept drawn according to some probability distribution. From this data, the learner is required to come up with a procedure that, given any unclassified instance, returns a confidence value [SS98, KS90] in the range that the given instance is in the concept. To simplify our treatment we assume that the instance space is -dimensional and the domain of each attribute is the real interval . We also assume that the examples are drawn from a probability distribution function, , which is bounded, i.e., there exists a bound such that for each instance , . 2.1 A LEARNING METHODOLOGY When learning from positive only data, we are to some extent trying to figure out the probability distribution function that produced the original data. The following proposition provides a basis for achieving this. Proposition 1 Let be any distribution satisfying the assumptions noted above. Let be a monotonic function which maps to the uniform distribution. Then for all points where is continuous, . The proposition is illustrated in Figure 1. 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting recombined weak classifiers

Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting methods with very simple base classifiers. One of the most simple classifiers are decision stumps, decision trees with only one decision node. This...

متن کامل

Obtaining Calibrated Probabilities from Boosting

Boosted decision trees typically yield good accuracy, precision, and ROC area. However, because the outputs from boosting are not well calibrated posterior probabilities, boosting yields poor squared error and cross-entropy. We empirically demonstrate why AdaBoost predicts distorted probabilities and examine three calibration methods for correcting this distortion: Platt Scaling, Isotonic Regre...

متن کامل

An Application of Boosting to Graph Classification

This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss...

متن کامل

Combining Ordinal Preferences by Boosting

We analyze the relationship between ordinal ranking and binary classification with a new technique called reverse reduction. In particular, we prove that the regret can be transformed between ordinal ranking and binary classification. The proof allows us to establish a general equivalence between the two in terms of hardness. Furthermore, we use the technique to design a novel boosting approach...

متن کامل

Boosting bonsai trees for efficient features combination: application to speaker role identification

In this article, we tackle the problem of speaker role detection from broadcast news shows. In the literature, many proposed solutions are based on the combination of various features coming from acoustic, lexical and semantic information with a machine learning algorithm. Many previous studies mention the use of boosting over decision stumps to combine efficiently these features. In this work,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003