Parallel Bayesian Additive Regression Trees
نویسندگان
چکیده
Bayesian Additive Regression Trees (BART) is a Bayesian approach to flexible non-linear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov Chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this paper we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.
منابع مشابه
bartMachine: Machine Learning with Bayesian Additive Regression Trees
We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and cap...
متن کاملParticle Gibbs for Bayesian Additive Regression Trees
Additive regression trees are flexible nonparametric models and popular off-the-shelf tools for real-world non-linear regression. In application domains, such as bioinformatics, where there is also demand for probabilistic predictions with measures of uncertainty, the Bayesian additive regression trees (BART) model, introduced by Chipman et al. (2010), is increasingly popular. As data sets have...
متن کاملThe Bayesian Additive Classification Tree applied to credit risk modelling
We propose a new nonlinear classification method based on a Bayesian “sum-of-trees” model, the Bayesian Additive Classification Tree (BACT), which extends the Bayesian Additive Regression Tree (BART) method into the classification context. Like BART, the BACT is a Bayesian nonparametric additive model specified by a prior and a likelihood in which the additive components are trees, and it is fi...
متن کاملPrediction with Missing Data via Bayesian Additive Regression Trees
We present a method for incorporating missing data into general forecasting problems which use non-parametric statistical learning. We focus on a tree-based method, Bayesian Additive Regression Trees (BART), enhanced with “Missingness Incorporated in Attributes,” an approach recently proposed for incorporating missingness into decision trees. This procedure extends the native partitioning mecha...
متن کاملBayesian Additive Regression Trees using Bayesian model averaging
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables p is large (e.g. p > 5, 000) the algorithm can become prohibitively expensive, computationally. Another method which is popular for hig...
متن کامل