Ensembles for Supervised Classification

نویسندگان

  • David E. Rumelhart
  • Michael R. Genesereth
  • Nils J. Nilsson
چکیده

This dissertation studies the use of multiple classi ers (ensembles or committees) in learning tasks. Both theoretical and practical aspects of combining classi ers are studied. We consider two di erent goals: The rst is to achieve better classi cation rates. We analyze both the representation ability of ensembles and algorithms that search for a solution in this representation space. Second, we consider the cost and time it takes to train classi ers. This leads us to consider systems that either learn with fewer labeled examples or learn while performing at their current ability. First we analyze the representational ability of voting ensembles. A voting ensemble may perform either better or worse than each of its individual members. We give tight upper and lower bounds on the classi cation performance of a voting ensemble as a function of the classi cation performances of its individual members. Boosting is a method of combining multiple \weak" classi ers to form a \strong" classi er. Several issues concerning boosting are studied in this thesis. We study SBA, a hierarchical boosting algorithm proposed by Schapire, in terms of its representation and its search. We show that if the weak learner has low representational complexity, SBA's search may fail to boost or may give a sub-optimal solution. We present a rejection boosting algorithm that trades-o exploration and exploitation: It requires fewer pattern labels at the expense of lower boosting ability. Ensembles may be useful in gaining information. We study their use to minimize labeling costs of data and to enable improvements on performance over time. For that purpose a model for on-site learning is presented. The system learns by querying \hard" patterns while classifying \easy" ones. This model is related to query-based ltering methods, but takes into account that, in addition to labeling, ltering through the data has a cost. The Query-By-Committee algorithm is used as a good approximator of the model space

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Insights into Machine Learning: Data Clustering and Classification Algorithms for Astrophysical Experiments

Data analysis domain dealing with data exploration, clustering and classification is an important problem in many experiments of astrophysics, computer vision, bioinformatics etc. The field of machine learning is increasingly becoming popular for performing these tasks. In this thesis we deal with machine learning models based on unsupervised and supervised learning algorithms. In unsupervised ...

متن کامل

Determination of Best Supervised Classification Algorithm for Land Use Maps using Satellite Images (Case Study: Baft, Kerman Province, Iran)

According to the fundamental goal of remote sensing technology, the image classification of desired sensors can be introduced as the most important part of satellite image interpretation. There exist various algorithms in relation to the supervised land use classification that the most pertinent one should be determined. Therefore, this study has been conducted to determine the best and most su...

متن کامل

Building Ensembles of Classi ers for Loss Minimization

One of the most active areas of research in supervised learning has been the study of methods for constructing good ensembles of classiiers, that is, a set of classi-ers whose individual decisions are combined to increase overall accuracy of classifying new examples. In many applications classiiers are required to minimize an asym-metric loss function rather than the raw misclassiication rate. ...

متن کامل

Parallel computation of kernel density estimates classifiers and their ensembles

Nonparametric supervised classifiers are interesting because they do not require distributional assumptions for the class conditional density, such as normality or equal covariance. However their use is not widespread because it takes a lot of time to compute them due to the intensive use of the available data. On the other hand bundling classifiers to produce a single one, known as an ensemble...

متن کامل

Classification on Soft Labels Is Robust against Label Noise

In a scenario of supervised classification of data, labeled training data is essential. Unfortunately, the process by which those labels are obtained is not error-free, for example due to human nature. The aim of this work is to find out what impact noise on the labels has, and we do so by artificially adding it. An algorithm for the noising procedure is described. Not only individual classifie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998