Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

نویسندگان

  • Dhruv Kumar Mahajan
  • Vivek Gupta
  • S. Sathiya Keerthi
  • Sellamanickam Sundararajan
  • Shravan Narayanamurthy
  • Rahul Kidambi
چکیده

For many applications, an ensemble of base classifiers is an effective solution. The tuning of its parameters (number of classifiers, amount of data on which each classifier is to be trained on, etc.) requires G, the generalization error of a given ensemble. The efficient estimation of G is the focus of this paper. The key idea is to approximate the variance of the class scores/probabilities of the base classifiers over the randomness imposed by the training subset by normal/beta distribution at each point x in the input feature space. We estimate the parameters of the distribution using a small set of randomly chosen base classifiers and use those parameters to give efficient estimation schemes for G. We give empirical evidence for the quality of the various estimators. We also demonstrate their usefulness in making design choices such as the number of classifiers in the ensemble and the size of subset of data used for training that are needed to achieve a certain value of generalization error. Our approach also has great potential for designing distributed ensemble classifiers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Methods Based on Bias–variance Analysis Title: Ensemble Methods Based on Bias–variance Analysis

Ensembles of classifiers represent one of the main research directions in machine learning. Two main theories are invoked to explain the success of ensemble methods. The first one consider the ensembles in the framework of large margin classifiers, showing that ensembles enlarge the margins, enhancing the generalization capabilities of learning algorithms. The second is based on the classical b...

متن کامل

An Eecient Method to Estimate Bagging's Generalization Error

Bagging [1] is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set [5, 4]. The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive for leave-one-out cross-validation one needs to train the underlying algorithm on the order of m times, where...

متن کامل

Estimation of Parameters for an Extended Generalized Half Logistic Distribution Based on Complete and Censored Data

This paper considers an Extended Generalized Half Logistic distribution. We derive some properties of this distribution and then we discuss estimation of the distribution parameters by the methods of moments, maximum likelihood and the new method of minimum spacing distance estimator based on complete data. Also, maximum likelihood equations for estimating the parameters based on Type-I and Typ...

متن کامل

Supervised projection approach for boosting classifiers

In this paper we present a new approach for boosting methods for the construction of ensembles of classifiers. The approach is based on using the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables, instead of using the weights of the instances to train the next classifier. With this method we construct ensembles that ...

متن کامل

برآورد هیدروگراف واحد مصنوعی با استفاده از تحلیل منطقه‌ای سیلاب و پارامترهای ژئومورفولوژیکی (مطالعه موردی: حوضه‌های آبخیز مارنج و کانی‌سواران، کردستان)

Estimation of flood hydrograph is of necessities in hydrological studies such as flood mitigation projects. This estimation in un-gauged watersheds is usually taken place using geomorphological characteristics of watersheds. The objective of this research is to estimate synthetic unit hydrograph using regional flood frequency analysis and geomorphological parameters of watersheds. 1-hour and 2-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.05482  شماره 

صفحات  -

تاریخ انتشار 2017