Boosted Regression (Boosting): An introductory tutorial and a Stata plugin
نویسنده
چکیده
Boosting, or boosted regression, is a recent data mining technique that has shown considerable success in predictive accuracy. This article gives an overview over boosting and introduces a new Stata command, boost, that implements the boosting algorithm described in Hastie et al. (2001, p. 322). The plugin is illustrated with a Gaussian and a logistic regression example. In the Gaussian regression example the R value computed on a test data set is R=21.3% for linear regression and R=93.8% for boosting. In the logistic regression example stepwise logistic regression correctly classifies 54.1% of the observations in a test data set versus 76.0% for boosted logistic regression. Currently, boost accommodates Gaussian (normal), logistic, and Poisson boosted regression. boost is implemented as a Windows C++ plugin.
منابع مشابه
Boosting methodology for regression problems
Classification problems have dominated research on boosting to date. The application of boosting to regression problems, on the other hand, has received little investigation. In this paper we develop a new boosting method for regression problems. We cast the regression problem as a classification problem and apply an interpretable form of the boosted naïve Bayes classifier. This induces a regre...
متن کاملObtaining Calibrated Probabilities from Boosting
Boosted decision trees typically yield good accuracy, precision, and ROC area. However, because the outputs from boosting are not well calibrated posterior probabilities, boosting yields poor squared error and cross-entropy. We empirically demonstrate why AdaBoost predicts distorted probabilities and examine three calibration methods for correcting this distortion: Platt Scaling, Isotonic Regre...
متن کاملBoosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate
In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a one-step boosted forest. We have shown with simulated and real data that the one-step boosted forest h...
متن کاملBoosted Aggregate Models in Commerce and Industry: An Introduction and Case Study
Boosting is a means of constructing a strong classi er by aggregating a sequence of weaker classi ers. The sequence evolves by focusing the attention of a base learning algorithm on remaining examples that are hardest to classify. Boosting algorithms can thus be used to enhance the performance of a simple classi er (typically a regression tree or tree stump) in an automatic way. This makes boos...
متن کاملIncorporating Boosted Regression Trees into Ecological Latent Variable Models
Important ecological phenomena are often observed indirectly. Consequently, probabilistic latent variable models provide an important tool, because they can include explicit models of the ecological phenomenon of interest and the process by which it is observed. However, existing latent variable methods rely on handformulated parametric models, which are expensive to design and require extensiv...
متن کامل